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Introduction 



The growth of mathematical programming since its birth in 1947 has 
been marked and stimulated by a distinguished series of Symposia at 
which the major part of the research workers in the field have presented 
their work. The quality of these meetings has been ensured by a certain 
continuity of participants and supporters. The first such meeting was held 
in Chicago in 1949, sponsored by The RAND Corporation; selected papers 
from it appeared as ''Activity Analysis of Production and Allocation", 
(1951), edited by Tjalling C. Koopmans. The first meeting to bear the title 
"Symposium" was held in Washington, D. C. in 1951 under the sponsorship 
of the National Bureau of Standards and the U. S. Air Force; abstracts and 
selected papers were published in "Symposium on Linear Inequalities and 
Programming" by the Air Force (1952). The "Second Symposium in Linear 
Programming" was held in Washington in 1955, under the same sponsor- 
ship as the first; its Proceedings, edited by Henry A. Antosiewicz, were 
published by the sponsors in 1955 The third symposium, "A Symposium on 
Mathematical Programming: Linear Programming and Recent Extensions" 
was held in Santa Monica, California in 1959; a report, "The RAND Sym- 
posium on Mathematical Programming", was published by The RAND Cor- 
poration in 1960. wjijwjv OJTY fwn } PUBLIC LIBRARY 

The fourth symposium, on wnich inis voiureT*^6rra;^ras*held in 
Chicago, Illinois, under the joint sponsorship of the Association for Com- 
puting Machinery, the Graduate School of Business of the University of 
Chicago, and The RAND Corporation, on June 18-22, 1962. The organizers 
of the present conference were drawn from the membership of SIGMAP, 
the Special Interest Group for Mathematical Programming in the Associa- 
tion for Computing Machinery. SIGMAP is devoted to furthering all aspects 
of mathematical programming. 

Forty-three papers were presented at the Symposium, to an audience of 
more than 240 persons, from five continents. Papers were given in each of 
eight areas of mathematical programming. In four of these, especially in- 
vited survey papers gave the audience a broad view of the methods and 
problems of that area. The surveys are among the twenty-three papers ap- 
pearing in full in these Proceedings. The remaining papers, whose appear- 
ance in full here has been sacrificed to the requirement that this volume be 
of reasonable size and to some authors' commitments to publish elsewhere, 
are given as abstracts. Full copies of such papers can generally be obtained 



irom me autnors. Papers 14, 16, and 38 were edited by the authors from 
transcriptions made at the Symposium. 

The first nine papers deal with the general theory of mathematical pro- 
gramming. The survey paper by Tucker, which begins this group, presents 
the theory of linear programming in its most powerful current form. The 
eight papers which follow deal with various aspects of the theory of both 
linear and nonlinear programming. 

Papers 10 through 13 deal with nonlinear programming with "nonlin- 
ear" used in the accepted sense as referring to * 'reasonably smooth*' non- 
linear objective functions and constraints. The first paper of these, by 
Wolfe, is a survey of most of the proposals which have been made for sol- 
ving such problems. 

Papers 14 through 18 are devoted to stochastic programming mathe- 
matical programming problems whose data may be random variables. Such 
problems require careful statement if meaningful results are to be obtained. 
The first paper of these, by Madansky, surveys the outstanding problems of 
this field. 

Papers 19 through 22 are concerned with computational procedures for 
very large linear programming problems. Such problems always have some 
regularity of structure that appropriately designed algorithms can take ad- 
vantage of. The papers of this group constitute four different methods of 
attack on aspects of that regularity. 

Papers 23 through 26 are concerned more intimately with the computa- 
tional processes of mathematical programming, examining in detail, 'ways 
in which variations of the simplex algorithm the most effective current 
tool for linear programming can be made even more effective. 

Papers 27 through 33 lie in the area of applications of mathematical pro- 
gramming. Since entire series of books are being written on this subject, 
these papers can no more than sample the area by showing some of the 
latest uses of currently available methods. 

Papers 34 through 37 are devoted to integer programming problems. 
The first of these is Gomory's 1958 paper, now almost a classic, which es- 
tablished the subject of integer programming. It has been substituted for 
his paper presented at the Symposium because, still in great demand, it has 
been out of print for some time. 

The last group of papers, 38 through 43, are concerned with problems of 
network flow, a special class of linear programming problem which, be- 
cause of its rich structure, has given rise to an extensive and elegant 
theory. The first paper, by Fulkerson, surveys this area, reviewing its 
main tools and its outstanding problems. 
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Comb/nafor/a/ Theory Underlying Linear Programst 



Albert W. Tucker 

The simplex method of G. B. Dantzig is much more than the basic 
computational tool of linear programming. It is also a combinatorial 
algorithm which provides constructive means of establishing fundamental 
theorems, not only in programming, but also in cognate areas, such as 
Farkas' theorem for linear inequalities and von Neumann's minimax 
theorem for matrix games. In an effort to discover the underlying theo- 
retical structure, the author has been led to develop a combinatorial linear 
algebra which employs pivot steps (Gauss -Jordan elimination) and inter- 
changes of rows and of columns to generate finite equivalence-classes of 
"dual linear systems.'* Such a class embraces in palpable form all the 
information needed to treat many theoretical and practical matters cus- 
tomarily handled by more elaborate linear algebra. Over an ordered field 
this algebra seems to provide a unified and simplified means of dealing 
with linear inequalities, linear programs, matrix games, etc. 



DUAL LINEAR SYSTEMS. 
The schema 



(Y z ) 



(X 1 ) . 



QII 



in 



(A) 



(1) 



E U 

'rr 



is a succinct joint representation of two systems of linear equations. One 



tThis paper has been prepared with the assistance of Drs. Michel L. 
Balinski and Robert R. Singleton and with the support of the Office of 
Naval Research, Logistics Branch. 
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system arises by forming the inner products of the vector X 1 with the 
columns of the matrix A, and setting each inner product equal to the cor- 
responding component of X 2 . This leads to a system of n linear equations 
in the m + n variables xj: 

. . . + x m a ml =x m+1 



Xi a ln + . . . + x m a mn = x m+n (2) 

The second system arises by forming the inner products of Y 2 with the 
rows of A, and setting each equal to the corresponding component of -Y 1 . 
This yields a system of m linear equations in the m + n variables y^: 



= - v m (3) 

In matrix notation these systems are written 
X 4 A = X 2 and AY 2 = -Y 1 

where X 1 , X 2 are row vectors and Y 1 , Y 2 are column vectors. 

The two systems are dual, in the sense that any solution X of the 
x -system is orthogonal to any solution Y of the y-system. This is because 



XY=x iyi + ... 

= X 1 Y 1 + X 2 Y 2 = X 1 (- AY 2 ) + (X 1 A) Y 2 = (4) 

For solutions X and Y which are nonnegative (i.e., x^ ^ and yj ^0, 
i = 1, . . . , m + n), the orthogonality XY = yields a strong condition on 
individual components: namely, xi = or/and yi = for i = 1, . . . , m + n. 
This is because the inner product XY is then a sum of nonnegative terms 
XiYi, which sum equals zero only if each individual term x i y i equals zero. 

By a natural generalization of ordinary analytic geometry, solutions X 
and Y may be regarded as (m + n) -tuples of coordinates specifying points 
in an (m + n) -dimensional space. Then the set of all solutions X and the 
set of all solutions Y constitute two complementary orthogonal linear sub- 
spaces in the (m + n)-space. The system X 2 = X*A specifies an m-dimen- 
sional linear subspace because the m components of X 1 can be taken 
arbitrarily and then the components of X 2 are determined. Similarly, 
AY 2 = -Y 1 specifies an n-dimensional linear subspace because the n com- 
ponents of Y 2 can be taken arbitrarily and then the components of Y 1 are 
determined. The two linear subspaces are orthogonal because of (4); they 
are complementary because the sum of their dimensions is m + n, the 
dimension of the containing space. Thus, each linear subspace determines 
the other. 
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X 2 



n - subspoce 
X''X 2 (-A T 



m- subspace 
X 2 =X ] A 




Fig. 1 

The complementary-orthogonal nature of the two subspaces is indicated 
schematically in Figure 1, where the matrix equation Y 1 = -AY 2 for the 
n-dimensional subspace has been rewritten as X 1 = X 2 (-A T ) by transposing 
and then substituting X 1 , X 2 for Y 1T , Y 2T . 

In plane analytic geometry two orthogonal straight lines through the 
origin have equations y = ax and x = -ay, where a = y/x is the usual slope 
of the first line and -a = x/y is the reciprocal slope of the second line. 
By analogy, the matrix A can be regarded as the "X 2 iX^slope" of the 
linear subspace specified by X 2 = X 1 A and the negative-transpose matrix 
-A T as the "X 1 :X 2 -slope" of the complementary -orthogonal linear sub- 
space specified by X 1 = X 2 (-A T ). 



COMBINATORIAL EQUIVALENCE. 

We will now see that the process of Gauss-Jordan elimination can be 
applied simultaneously to the x-system (2) and the y-system (3) to pass 
from one pair of dual linear systems to a second pair of dual linear sys- 
tems which are equivalent in the sense that the second systems have the 
same solutions as the first systems. Suitably organized, this use of Gauss- 
Jordan elimination leads to the concept of "combinatorial equivalence," 
so called because each equivalence class contains just a finite number of 
members . 

If a coefficient ay ^0 in schema (1), the two equations in which it 
enters, 



a iiym+ 



a in y m+n = 



(5) 



can be solved for x^ and ym+j to S ive 

ajj _ i _ _ 

Xl ay * ' * + Xm +3 ay ' ' ' Xm ay " Xl 



yi 



a in 
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These can be used to eliminate xj and y m +j from the remaining equations, 
but of course x m+ j and yi are introduced into the left sides. When this is 
done, the new coefficient for x n in equation k of (2) is 



ahj ai 



* k 



which is also the coefficient for y m +k in equation h of (3). 

We are thus led to a transformation of the schema (1), called an 
"elementary pivot transformation" (or briefly, "pivot step"), which may 
be represented as follows: 

Vm+j Vi 



=-yi 



x m+j 



of i 



-yee" 1 



= Xj 



The pivot a is any element ay *0, /? is any other entry aik in the pivot's 
row, y is any other entry ay in the pivot's column, and <5 is the entry a^ 
in0's column and y's row. In the margins X[ and x m+ j have been inter- 
changed, as have y t and y m +j. In the interchange of the y's the minus 
sign stays with the row. All other marginal variables remain unchanged. 

In addition to pivot steps we admit two other types of elementary trans- 
formations on a schema: interchange of any two rows and interchange of 
any two columns. These are trivial modifications corresponding merely to 
the interchange of two equations in one system and of corresponding terms 
in the other system. Clearly the solutions of the two systems are not 
changed except for the order in which the variables appear. 

Note that the schema (1) imposes an ordered partition 

(1, . . ., m|m+l, . . ., m+n) 

of the subscripts of the variables. An interchange of two rows permutes 
two of the subscripts 1, . . . , m; an interchange of two columns permutes 
two of the subscripts m+1, . . . , m+n; a pivot step permutes one subscript 
from 1, . . . , m with one from m+1, . . . , m+n. 

A finite succession of elementary transformations of the three types 
results in a transformed partition 



(1, .. ., m|m+l, ..., m+n) 
and transformed schema 
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(A) 



= X 



m+t 



7 m+n 



ln 






(6) 



wherejr denotes the permutation of m+n objects which carries (1, . . . , m+n) 
into (1, . . ., m+n). 

The new dual linear systems 



A = 



and 





are equivalent respectively to the original dual linear systems 
X*A =X 2 and AY 2 = -Y 1 

in that they have the same solutions subject to the understanding that the 
correspondence between the solutions of one system and those of the 
equivalent system is a one-to-one correspondence established by the 
permutation TT. 

Define any two schemata 



Y 1 



to be combinatorially equivalent if the dual linear systems XJ-A = X& and 
AY|. = YTJ- have the same solutions as the dual linear systems X*A = X 2 
and AY 2 = -Y 1 , where X^ = [Xk X^l arises from X = [X 1 , X 2 ] by a 
permutation :r of the m+n component variables and Y^ arises similarly 
from Y. Then, as seen above, a finite succession of elementary transforma- 
tions of the three types leads from a schema (1) to a combinatorially 
equivalent schema (6). Conversely, it can be shown that it is always pos- 
sible to find a finite succession of elementary transformations of the three 
types leading from a schema to any combinatorially equivalent schema. 

Table 1 shows a numerical example of combinatorially equivalent 
schemata, generated in this case by a single cycle of pivot steps. The set 
shown is essentially complete, in the sense that all other schemata com- 
binatorially equivalent to these may be found by permutations only. The 
pivots are starred to permit the reader to check the transformations, the 
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Table 1 

A SMALL EXAMPLE OF COMBINATORIALLY EQUIVALENT DUAL 

LINEAR SYSTEMS 




- X 3 - X 4 - X 5 



describes the following pair of dual systems of linear equations: 

- 2x 2 = x 3 
3Xi + 5x 2 = x 4 
4Xi + 9x 2 = x 5 



3y4 + 4y 5 = - yi 
5y 4 + 9y 5 = -y 2 . 



From the schema (1) the following comb materially equivalent schemata are 
formed by successivel pivot steps, employing the starred entries as pivots. 



(2) 



(3) 



x l 

X 3 
(4) 


6/2 
-1/2 -5/2* 


8/2 
-9/2 


= X 2 =X 4 =X 5 



l 



x 4 



-3/5 
1/5 - 



6/5 
2/5 



-7/5 
9/5* 



= X 5 



(5) 



x l 

X 5 


-4/9* 
1/9 


8/9 
-2/9 


7/9 
5/9 



(6) 



(8) 



= x l 



= X 4 



X 2 

X 4 


-5/3 
1/3 


-6/3* 



7/3 
4/3 



= X 5 



-9/4 -8/4 -7/4 
1/4 3/4* 



(7) 



X 4 



(9) 



ys 



5/6 -3/6 -7/6 
2/6 8/6* 



: x l = X 2 



X 5 



X 5 



9/8 
2/8 


-4/8 



7/8* 
6/8 



= X! = X 2 = X 4 



X 4 



9/7 - 
-5/7 



4/7 
3/7 



8/7 
6/7 



These nine schemata, along with those formed by row and column permu 
tations, constitute the finite equivalence class (108 in all). 
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Table 2 

A UNIMODULAR EXAMPLE OF COMBINATORIALLY EQUIVALENT 
DUAL LINEAR SYSTEMS 



X 2 



X 5 



X 2 

X 4 



(124:356) 

y 3 ys ye 




" X 3 " X 5 " X 6 



~ x l ~ X 3 - X 5 



(123:456) 



Y4 




= X 3 = X 4 = X 6 



= Xg = X 4 = Xg 



(134:256) 

y2 ys ye 


= -y 3 x 3 


(135:246) 

y2 y4 ye 


~ ys XR 


(136:245) 


0-11 
100 

-1 1 -1 


-110 

100 

-i i -i 


-110 
100 
1 -1 -1 


= X 2 = X 5 = Xg 

(235:146) 

yi y4 ye 


= X 2 = X 4 - X 6 

(236:145) 


= x 2 = x 4 = x 5 
(245:136) 

yi y 3 ye 


-i -i o 

110 
-1 -1 


"~ ~y2 x 2 


-1 -1 
110 
10-1 


= ~y2 x 2 
= -ys X 4 


010 
110 
-1 -1 


= Xj = X 4 = X 6 

(246:135) 


= x t = x 4 = x 5 
(345:126) 

yi y2 ye 


XJ = Xg = Xg 

(346:125) 


010 
110 
10-1 


= -y* x 4 

= -ye x 5 


010 
1-10 
-1 -1 


= -y< X 4 


010 
1-1 
10-1 



= X 2 = X 6 



- x l ~ X 2 " X 5 



The full equivalence class contains 468 (= 13 x 36) schemata, i.e. the 
above 13 and all their row and /or column permutations. 



8 



MATHEMATICAL PROGRAMMING 



fractional entries being left unreduced as a futher aid. The complete 
equivalence class consists of all row and column permutations of these 
nine representations, or 



9x 2!x 3! 



12 = 108 



in all. The maximum possible number for a 2 x 3 matrix is (2 + 3) ! = 120. 
However, because of the zero entry in schema (1), one representation and 
its 12 permutations are lacking. 

Table 2 shows a second example which is convenient for computation, 
since its entries and all subdeterminants that can be formed from it have 
the value 1, 1 or 0, so that denominators do not occur. The reader may 
check that different sequences of elementary transformations may result 
in the same schema. Thus 



and 



(123:456) (124:356) (134:256) 



(123:456) (143:256) (134:256). 



DUAL LINEAR PROGRAMS. 

We will now see that the algebra of dual linear systems, pivot steps, 
combinatorial equivalence, etc., developed above in homogeneous form (for 
an arbitrary number -field), can be employed in nonhomogeneous form (for 
an ordered number-field, such as real numbers or rational numbers) to 
treat dual linear programs and the Dantzig simplex method. 

Let it be required to 

minimize UB + d constrained by U = 0, UA = C, and 
maximize CV + d constrained by AV ^ B, V = 
We reformulate these dual linear programs in the following schema: 



(Y z ) 
ym+1 



(X 1 ) 



(gmox,x's>0) 



(=X 2 ) 



QOO 


QOI (-C) Qon 


(-B) 


! (A) \ 


Qmo 


m1 Q mn 



(7) 
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where aoo = -d, a j = -Cj, a^ = -bi/ 1 = -UB - d, 77 = -CV - d, X 1 = U, 
X 2 = UA - C, Y 1 = AV - B, Y 2 = V. The instructions minimize and max- 
imize are interchanged because maximizing and minimizing 77 are 
equivalent to minimizing UB + d = | and maximizing CV + d = -77* Note 
that the schema (7) differs from schema (1) in just two respects: the x's 
and y's are required to be nonnegative, and non -homogeneity is introduced 
by the border column with marginal labels 1 and and by the border row 
with marginal labels 1 and 77. 

The Dantzig simplex algorithm consists of a non-repeating sequence of 
pivot steps, using pivots not in the border row and column. Hence it must 
end in some terminal schema after a finite number of steps, since it re- 
mains within a finite (combinatorial equivalence) class of schemata. The 
four possible terminal schemata are given in the following theorem -t 

Theorem: 

By a finite succession of pivot feteps, excluding pivots in the border 
row and column, it is always possible to pass to a (combinatorially 
equivalent) terminal schema 



(Y*) 



IT 

(-T) min , y s > 0) 

~. 1 1 * * '-.. ""* 



'm+1 



1 


00 


Q ol . (-C) d on 


X T 


oio 


0,, . . . 0, n 





(-8) 


(A) 




mo 


Oml mn 



=-XT 



(8) 



max, x's 0) 



which has one of the following four forms : 



tA compact inductive proof (to be published elsewhere) has recently 
been devised by the author along lines suggested by the paper of R. E. 
Gomory and M. L. Balinski presented at this Symposium, and by 
G. B. Dantzig's inductive proof [see IBM Journal 4, 1960, pp. 505-506] 
which assumes B = 0. 
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(I) 

1 


1 


=min 




. 







e 

. 








. 




. 
. 
e 





=- 



(ID 
1 







(III) 1 ... y 
1 

ffl 



e 



e 



e 



=- 



(IV) 
1 


1 ... 




- 









e 





. 





e 





+ 


... 



Here +, , O, 0, and - symbolize quantities which are positive, non- 
negative, zero, nonpositive, negative, respectively. Inside the four boxes 
above, they represent the actual nature of the entries a in the terminal 
schema (7). At the left and top margins, they represent values of xf, . . . , 
xj and y^Ti, . . . , y^T^ which we will assign to determine the correspond- 
ing values of x^JTT, . . . , x^^ and yf, . . . , y^ at the bottom and right. 

Terminal form (I) is the "successful" one, yielding optimal solutions 
tojDoth programs. As indicated in the border row and column of (I), 
-B ^ and -C = 0. If the marginal variables at the left and top are set 
equal zero, nonnegative x's and y's are determined at the bottom and right, 
while and 77 take on the common value a 00 . Such feasible solutions with 
| = TJ are necessarily optimal. _ 

In terminal form (II) the border row -C is again nonnegative, but there 
is another row (shown above as the last JTOW) which is nonnegative and has 
a positive entry in the border column -B. The x-program is feasible but 
its objective function 4 is not bounded above. To see this take 



xf - = XrrTT = and an y x m = 
to determine nonnegative x's at the bottom, and 

| =a 00< +Xi SL a m O, ~a m O> 0. 

Then + = as x^ * + oo, so no maximum exists. At the same time the 
y-program is infeasible, because y^J < for any assignment of non- 
negative y's at the top. 
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Terminal form (IE) is just the "negative transpose" of form (EQ. The 
x-program is now infeasible and the y-program is feasible but with ob- 
jective 77 not founded below. 

In terminal form (IV) there is a nonnegative row (shown as the last row) 
with positive entry in the border column and a nonpositive column {shown 
as the last column) with negative entry in the border row. Any assignment 
of nonnegative x's at the left and nonnegative y's at the top makes 
x m^n < and v m < - Hence neither program is feasible. 



ANALYTIC GEOMETRY OF DUAL LINEAR PROGRAMS. 

The constraint equations of the x-program in schema (7) specify a linear 
manifold 

P:X 2 =X 1 A- C 



in m+n -space. P is an m -dimensional linear manifold which can be ob- 
tained by translating the m -dimensional linear subspace X 2 = X*A parallel 
to itself until it intercepts "the X 2 -axis" (i.e., the n-subspace X 1 = 0) at 
the point X 1 = 0, X 2 = -C. Similarly the constraint equations of the 
y-program in schema (7) specify a linear manifold 



QiX 1 



1/-AT 



A T ) + B 1 



in the (m + n) -space, where this matrix equation is obtained from 
yi = _ AY 2 + B by transposing and then substituting X 1 , X 2 for Y 4T , Y 2T . 
Q is an n-dimensional linear manifold which can be obtained by 
translating the n-dimensional linear subspace X 1 = X 2 (~A T ) parallel to 
itself until it intercepts "the X-axis" (i.e., the m-subspace X 2 = 0) at the 
point X 1 = B T , X 2 = 0. Hence P and Q are linear manifolds of comple- 
mentary dimensions, m and n, which are orthogonal because they are 
parallel, respectively, to the linear subspaces X 2 = X*A and X 1 = X 2 (~A T ) 
that were seen to be orthogonal in our earlier discussion of (homogeneous) 
dual linear systems. Thus, in summary, P and Q are complementary 
orthogonal linear manifolds of dimensions m and n. 

With respect to the partitioning (x t , . . . , x m |x m +i, . . . , x m + n ) of the 
m+n coordinates into X 1 and X 2 , indicated schematically in Fig. 2, P has 
the matrix A as "X 2 rX 1 -slope" and X 1 = 0, X 2 = -C as "X 2 -intercept, " 
while Q has the negative transpose matrix -A T as "X 1 : X 2 -slope" and 



X 1 = B T , X 2 = as 



^-intercept." Passing from 




to 
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m-subspace 
X 2 =X 1 A 




n-manifold Q 
= X 2 (-A T )+B T 



n-subspace 
= X 2 (-A T ) 



Fig. 2 



by any succession of elementary transformations, excluding pivots in the 
border row and column, we get another " slope -intercept" representation, 



Xj- = 



- C and 



- = x(-A T ) + B T 



(9) 



of the same linear manifolds P and Q, because combinatorial equivalence 
leaves invariant the solution sets specifying P and Q. With respect to the 
new partitioning (xf, . . . , Xf^ | x^^, . . . , Xfj^j) of the m+n coordinates 

into Xj- and X&, P has the matrix A as "x:Xjr -slope" and xj- = 0, x = -C 
as "X 2 -intercept," while Q has the negative-transpose matrix -A T as 
"XJr:X$-slope" and xjr = B T , X$- = as "X^-intercept." 

In terms of this analytic geometry, the aim of the Dantzig simplex 
method is to pass, if possible, from the initial "slope-intercept" represen- 
tation of P and Q, based on schema (7), to a terminal "slope -intercept" 
epresentation (9), based on schema (8), in which the intercepts -C and 
B T will both be nonnegative. In terminal form (I) this aim is achieved. In 
terminal form (II) we fail because Q does not intersect the "nonnegative 
(m+n)-orthant" R, consisting of all points X 0, and so cannot have a 
nonnegative intercept at all; in terminal form (III) P does not intersect 
the orthant R; and in terminal form (IV) neither P nor Q intersects the 
orthant R. 
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HOMOGENEOUS LINEAR PROGRAMS. 
The schema 
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(xM 



( max, x's > 



(Y 2 



Vm+n 



(y's > 0) 



Q 10 


Q 11 ... Qin 


(-B) 


. . 
(A) 


a mo 


Qml Qmn 



(10) 



obtained by deleting the border row of schema (7), presents a mixed form 
of dual linear systems, one homogeneous and one not. Here the x-program, 
to maximize J = -X*B (or minimize X*B = -) constrained by X 1 =; 0, 
X*A = X 2 ^ 0, is a "homogeneous linear program" with homogeneous 
objective -function and homogeneous constraints; and the y-program, to 
solve B - AY 2 = Y 1 i 0, Y 2 i 0, is merely a nonhomogeneous "feasibility 
program." 

In this case the terminal schema (8) with border row deleted has one of 
just two possible forms: 



(I) 1 



(II) 1 

=-e c 



-e 



Form I shows that max = since ^ for any nonnegative x's and 
|=0 trivially for all x's zero, and that the y-program has a feasible 
solution obtained by taking yinTT = = yf^TH = - Form II shows that J is 
unbounded above since +< for xf = = x^li = and x^ +<, and 
that the y-program is infeasible since yfjj < for any assignment of non- 
negative values to yj^, . . . , yfjj^. 

These two mutually exclusive terminal forms provide simple construc- 
tive means of establishing transposition-duality theorems such as Farkas' 
theorem for linear inequalities. In particular, these two alternative forms 
can be used to prove the author's lemma [Ref. 2, page 5] from which 
classical and sharpened transposition-duality theorems follow [ in Ref. 2]. 
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MULTIPLICITY OF SOLUTIONS. 

We now look again at form (I) of the terminal schema (8) the "suc- 
cessful" form, in which the dual linear programs possess optimal solu- 
tions. If none of the m+n border entries (excluding a o) is zero, then the 
terminal schema has the form 



(a) 

1 


1 . . 




+ . . . + 





. 







- 





= max = -f s + 

In this case both programs have unique optimal solutions. 

On the other hand, if at least one of the m+n border entries is zero, 
then at least one of the dual programs has a multiplicity of solutions. In 
this event, it can be shown that a further succession of elementary 
transformations (excluding border entries as pivots) leads to a normal 
form of one of the following three types: 



ib) 1 ' 


1 






= min 






block of 




r+ 





"positive- 


=0 








headed" 
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columns 
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1+ 







=0 





- 




=- 





- 




= 



= max 



(c) 1 + + 


1 





+ -f 





block of 







"negative- headed" 






rows 












=max 



= = + 



(d) 1 '-(- +" 


1 









= min 








block of 




r+ 








"positive- 


=0 


; 








headed" 
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columns 





l + 





- 




=0 





block of 




ss 




"negative -headed" 










rows 




-- 



= max = 



- + 
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La a block of "positive -headed" columns each column has its first entry 
positive or_ a later entry positive which is preceded by zero entries only. 
In a block of "negative -headed" rows each row has its first entry negative 
or a later entry negative which is preceded by zero entries only. 

In normal form (b), with its p + 1 by n block of ' 'positive-headed" 
columns, the x-program has apt -dimensional closed convex set of optimal 
solutions and the y-program has a unique optimal solution. The symbols 
+ for the marginal x's indicate an optimal solution in the (relative) interior 
of the JLJ -dimensional convex set. The existence of such an optimal solution 
can be inferred from the block of * 'positive -headed" columns. 

In normal form (c), with its m by v + 1 block of "negative -headed" 
rows, the x-program has a unique optimal solution and the y-program has 
a y-dimensional closed convex set of optimal solutions: The symbols + 
and for the marginal y's at top and y's at right indicate an optimal 
solution in the (relative) interior of the ^-dimensional convex set. The 
existence of such an optimal solution can be inferred from the block of 
"negative-headed" columns. 

In normal form (d), with its pi + 1 by n - v block of "positive-headed" 
columns and its m - p, by v + 1 block of "negative -headed" rows, the 
x-program has a \JL -dimensional closed convex set of optimal solutions and 
the y-program has a ^-dimensional closed convex set of optimal solutions. 
Marginal symbols indicate (relative) interior solutions, as previously 
described. 

The four forms (a), (b), (c), (d) exhaust the possibilities for optimal 
solutions in the "successful" case. For a specific pair of dual linear 
programs having optimal solutions, only one of the four forms can occur. 

Note the full "complementary slackness" in the four forms (a), (b), (c), 
(d). Opposite each positive or zero component of the optimal x-solution 
indicated there is, without exception, a zero or positive component of the 
optimal y-solution indicated. 
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A Mutual Primal-Dual Simplex Method 



Michel L Balinski 
Ralph E. Gomory 

1. SIMPLEX METHODS 

A pair of dual linear programs 

Primal (Row) Program Dual (Column) Program 

Minimize Maximize 

yo = aoo + ai iym+1 + . . . + a 0n y m+n XQ = aoo + Xl a 10 + 

constrained by constrained by 

-yi = a 10 + a liym+ i + + a ln y m+n S Xl g 



+n = 



is conveniently exhibited in the tableau 



'Vm-fn 



- - - + x m a m0 



(1.1) 



x m 
+ x m a ml ^ 

+ x m a mn S 



G oo 



mO 



Primal Program 



= yo 



(1.2) 



= "Vrr 



= X 



Dual Program 

In the dual linear programs (1.1) or their tableau (1.2) y , y lt . . . , y m , 
tn basic variables of the primal program are expressed in terms of 
v m+i Ym+n' the nonbasic variables; similarly, x , x m+1 , . . . , x m+n , 
the basic variables of the dual program, are expressed in terms of 
x i * m , the nonbasic variables. 

A pivot step on (1.1) or (1.2) with pivot entry ay *0 (i,j *0) is a 
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Gauss -Jordan or complete elimination step which simultaneously solves 
the i th (row) equation of the primal for y m +j and the jth (column) equation 
of the dual for xj, and uses these equations to eliminate Ym+j an d x^ from 
the remaining row and column equations at the cost of introducing y^ and 
x m+j- The v m+j an( J x i thereby become basic variables and y^ and x m+ ; 
nonbasic variables. The pivot step with pivot entry a. - ay *0 takes the 
tableau 



= x m + j 



(1.3) 



into the tableau 



-ya 



(1.4) 



the other marginal variables and labels remaining in the same positions. 
Successive tableaus obtained by pivot steps simply reexpress the original 
pair of dual linear programs through different partitions into sets of basic 
and nonbasic variables. Any such tableau has the form 



1 V/n+1' 



X 1 



QOO QO! 

-- -I 


Q on 


T 




oio oj. 


ain 


Qmo Qm1 


. am 



(1.5) 
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where the primed variables are a rearrangement of the original variables 
and the primed entries are determined by the succession of preceding pivot 
steps. Basic solutions to both programs are associated with any tableau 
(1.5); they obtain by setting the nonbasic variable equal to zero, thereby 
determining values for the basic variables y{ = -aJ , . . . , y' m = -a mo , 
Yo = aoo =x o x m+i 



, x m+n 



feasible primal solution obtains; if 



5 n . If -aJ ^ 0, . . . , -a mo a basic 

wi = 0, . . . , a^ n ^ a basic feasible 

dual solution obtains. If both primal and dual basic feasible solutions ob- 
tain, then they constitute optimal solutions to the programs. 

A simplex method for solving a pair of dual linear programs is a finite 
sequence of tableaus exhibiting equivalent pairs of dual linear programs 
obtained by successive pivot steps, with prescribed pivot entry choice 
rules, which obtain a tableau exhibiting optimal solutions to both programs, 
or the noncompatibility of the primal and/or dual constraints. Letting 
denote nonnegative entries, and nonpositive entries, these cases can be 
exhibited in tableau form: 



(optimal solutions) 



(1-6) 





1 
1 







1 
1 

I 

J_ 




-1- 


_1 









1 
1 

1 
1 





(primal or row con- 
straints noncom- 
patible) 



(1.7) 



(dual or column 
constraints non- 
compatible) 



(1.8) 
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A primal (dual) simplex method is a simplex method beginning with a 
tableau exhibiting a primal (dual) basic feasible solution with pivot steps 
which maintain primal (dual) feasibility in each succeeding tableau. A 
primal (dual) pivot choice rule is as follows: 

If a tableau (1.5) does not exhibit optimal solutions to both programs 
there must exist a a^j < for some j (a a{ > for some i). Either 

(a) every entry in the column of aoj < is nonpositive (every entry 
in the row of a! > is nonnegative) or (b) there exist positive 
(negative) entries, (a) The tableau exhibits the noncompatibility of 
the dual constrains (of the primal constraints) . 

(b) Choose as pivot entry a^j > (aj^ < 0) satisfying 

a ko = max a so / a ol = max a os\ 
a kj " 4j > a sj \*U a is < a{ 8 / 

If an initial tableau does not exhibit a primal (dual) basic feasible solution 
some special device is introduced enabling consideration of an allied 
problem whose solution provides a primal (dual) basic feasible solution 
for the original problem. The original Dantzig method [1] is a primal 
simplex method; the Lemke paper [5] describes a dual simplex method. 
The proofs for termination of a simplex method in a finite number of 
pivot steps use the fact that any tableau is uniquely determined by its 
associated nonbasic variables (of primal or of dual programs) and that 
there exist at most Pm^) 55 ( n n m ) possible sets of nonbasic variables. 
Then any pivot steps assuring that no tableau is ever repeated guarantees 
finiteness. The finiteness proof for a primal (dual) simplex method in 
which no "degeneracies" occur, i.e., in which a[ < 0, i *0, (a^j > 0, j *0) 
is clear, for each pivot step strictly decreases (increases) the value of 
aoo and thereby assigns an order to the sequence of tableaus. If, however, 
degeneracy occurs, some form of lexicographic order must be introduced 
to avoid the possibility of cycling. 



2. A MUTUAL PRIMAL-DUAL SIMPLEX METHOD 

We describe here a simplex method for directly solving any pair of 
dual linear programs (1.1) or (1.2). The method specifies pivot choices for 
any tableau whether feasible or not, and degenerate or not, which lead to a 
tableau exhibiting a primal feasible solution (or, primal infeasibility) and 
then to a tableau exhibiting optimal solutions to both programs (or, primal 
objective unboundedness and dual infeasibility). This is accomplished by 
using a primal simplex pivot choice rule until primal degeneracies occur; 
then a dual simplex pivot choice rule is used on a subtableau corresponding 
to primal zero valued basic variables until the degeneracies are resolved. 
If "sub-dual degeneracies" are or come to be present in the subtableau, a 
primal simplex pivot choice rule is used on a sub -subtableau until these 
degeneracies are resolved, and so forth. The hierarchy of tableau, sub- 
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tableau, sub-subtableau, etc., is used to establish a hierarchy of goals 
which are associated with every tableau. Every pivot step leads to a 
strict "improvement" in one of the goals, with goals higher in the 
hierarchy remaining unaffected. This serves to order the sequence of 
tableaus, thus assuring termination of the method in a finite number of 
pivot steps. 

Every tableau (1.5) in the sequence of tableaus obtained by successive 
pivot steps has associated with it a hierarchy of numbered subtableaus each 
with a distinguished entry (and hence row and column). Odd numbered sub- 
tableaus k have the form 



("primal or row 
feasible form") 



(2.1) 



with o<k) ^ 1, the number of rows, and -(k), the value of the distinguished 
entry. Even numbered subtableaus k have the form 



("dual or col- 
umn feasible 
form") 



(2.2) 



with a(k) = 1, the number of columns, and (k), the value of the distin- 
guished entry. 

The hierarchy of subtableaus associated with a tableau is initiated as 
follows. Subtableau 1 consists of all columns and all rows of (1.1) with 
a^ = 0, with distinguished entry some a^o > (if all a^ = and the entire 
tableau is in primal feasible form, the entire tableau is taken as "sub- 
tableau 1"). Given any tableau suppose a subtableau k in primal (dual) 
feasible form has been defined with distinguished row R and column C. 
If C (if R) contains zeros and R is not all nonnegative (C is not all non- 
positive), a subtableau k T l in dual (primal) feasible form is defined as 
consisting of rows (columns) corresponding to the zeros of C (of R), 
columns (rows) corresponding to the nonnegative entries of R (non- 
positive entries of C), with distinguished entry some negative entry of R 
(some positive entry of C). Schematically, 
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R 
A 



'FT 
o r 



c< 



o 






1 



where the whole diagram represents a primal feasible form subtableau k, 
and the subdiagram enclosed in solid lines a dual feasible form subtableau 
k + 1. 

Associate with any tableau and its subtableaus a hierarchy of goals with 
goal k (k - 1, 2, . . . ) being to pivot to obtain a new tableau whose new sub- 
tableau k (if it exists) has a(k) larger, or, has a(k) unchanged but /3{k) 
larger; while o<i), 0(i) for i < k remain unchanged. 

Suppose, now, that we have reached the p th tableau with entries ag , 

along with its well-defined hierarchy of subtableaus and their associated 
values (a^(k), p (k. We describe the choice of pivot entry to obtain the 
(p + i)-th tableau and the hierarchy of subtableaus associated with the 
(p + l)~th tableau. 

Rules 

Suppose the subtableau with highest index K is in primal (dual) feasible 
form. 

(a) Apply the primal (dual) simplex pivot choice rule (see above) to the 
subtableau K. Maintain same hierarchy of subtableaus, except K. 
Redefine K and any subsequent ones if possible. 



If rule (a) is not applicable then one of the two possibilities (b) 
or (c) must hold. 



(b) 
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Choose as pivot entry the distinguished entry. Maintain same 
hierarchy of subtableaus, except K 1 and K. Redefine K - 1 and 
any subsequent ones if possible. 



(0 



- ** 



4 

,-*! jjj j j 

_J 



Choose as pivot entry the negative (positive) entry in the distin- 
guished row (column) whose column is nonpositive (whose row is 
nonnegative) . Maintain same hierarchy of subtableaus except K. 
Redefine K and any subsequent ones if possible. 

Finally , if ever the choice of pivot entry is an element of the 
first row (some ajj*.) or of the first column (some aP), stop. 

If the process is stopped because the choice of pivot entry is a^, 
solutions to both programs obtain; if it is stopped because the choice 
of pivot entry is some a? Q , the primal or row program has no feasible 
solution; if it is stopped because the choice of pivot entry is some ajj, 
the primal program is in feasible form but the dual or column program 
has no feasible solution. 

If the pivot entry is chosen according to (a) either there is no K^h sub- 
tableau or a p +i(K) ^a p (K) and if a p .i(K) = a p (K) then ^ p+1 (K) >0 p (K) 
(due to the absence of "Kth degeneracies"); while a p +i(i) = a p (i) and 
p+1 (i) = p (i) for i < K since the pivot entry has zeros in rows and 
columns that could have an effect on these values . If the pivot is chosen 
according to (b) either there is no (K - l) th subtableau or 
G?p+i (K - 1) > a p (K ~ 1) [see subtableau in (b)J; while again, and for the 
same reason, o? p +i(i) =a p (i) and p +i(i) = p tt) for i < K - 1. Finally, if 
the pivot is chosen according to (c), either there is no K** 1 subtableau or 
Q? p +i(K) > o? p (K) [see subtableau in (c)], while again o? p+1 (i) = or p (i) and 
Pp+iW = p (i) f r i < K. Therefore, this choice of pivot entry and as- 
signment of subtableaus always leads to strict improvement in some goal, 
thereby ordering the sequence of tableaus obtained in successive pivot 
steps. As noted above, this suffices to assure termination of the method 
in a finite number of pivot steps . 

A. W. Tucker has pointed out that the inductive counterpart of this 
construction leads to a particularly simple and appealing proof of ter- 
mination. The induction is made on the number m-ha of primal (or dual) 
variables. 
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3. SOME REMARKS 

It is perhaps of interest to review the primal-dual algorithm of 
Dantzig, Ford, and Fulkerson [2] to enable comparison with the algorithm 
proposed here. By our definition the primal-dual algorithm is a simplex 
method applied not directly to the problem as stated but to an "extended- 
problem and with rather special pivot choice rules. 

The problem to be solved and its dual as posed in [2] is 



Primal (Row) Program 
Minimize 

yo ' a^yim-i + * ' * + 
constrained by 



a mi = 



Maximize 
KO = 
constrained by 



+ + x m a mo 



' " + x m a ml 



(3 . 1) 



+n = < 



where it is assumed that the dual program has a feasible solution (if not, 
the extra variable y m * n +i = and constraint y m +i + . . . + y m+n + y m + n -i 
= BTO^.O with a m+ i, arbitrarily large can be added to the primal pro- 
gram, thus assuring an easily found initial feasible solution to the new 
dual problem). An "extended" problem and its dual is then defined which 
can be exhibited in the tableau 



1 Y 



Extended Primal Program 



L._l_ *m-l-n 

m T 1 mTn 



boo 



Sm 



x Z 



where b : 



(3.2) 



\ 



Dual to 
Extended Primal Program 

Here z = y t + + y m is to be minimized subject to the row equations 
and yi = 0, . . . , ym+n = > and cr is to be maximized subject to the 
column equations and o^ = 0, . . . , o^n = 0. Since aj ~ 0, (3.2) is in 
primal feasible form. Notice that a feasible solution exists to the primal 
program (3.1) only if min z = 0. 

The primal -dual algorithm can be described as consisting of a finite 
sequence of tableaus, starting with (3.2), obtained by successive pivot 
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steps. With every tableau is associated a (not necessarily basic) feasible 
solution {x , Xj, ... x m + n } to the dual or column program (3.1). Then, 
given any tableau and its associated {x , x 4 , . . . , x m+n } a primal pivot 
choice rule is used on a subtableau consisting of all columns except for those 
corresponding to ym+j * r which x m +j > (j = 1, . . . , n). If a primal 
pivot choice cannot be made the subtableau can only be in ("optimal") 
form (1.6) and one of three cases for the complete tableau must hold: 

(a) The distinguished entry has value 

(b) The distinguished entry has positive value and the distinguished 
row some negative entry 

(c) The distinguished entry has positive value and the distinguished 
row all nonnegative entries. 

If (a) occurs the values exhibited for y m *i, . . . , y m + n in tne tableau and 
the associated values for {x , x lf . . ., x m + n } constitute optimal solutions 
to the dual programs (3.1). This is easily established since these values 
are feasible and they make x$ = y . If (b) occurs then a new feasible 
solution to the dual program (3.1) with values {x~ , x lf . . . , x m + n } 
is associated with the whole tableau, with x > x . Namely, 



_ 

x k ~ x k + **0k = min 



(j=l, ...,n) (3.3) 

where o^ is the value exhibited by the tableau of the variable ok- (This 
step can easily be described as a ''partial pivot step" in which the 
values of the x k are altered by the values of the ok>- If (c) occurs then 
the minimum value of z is attained but is positive, implying no feasible 
solution to the primal problem (3.1) exists. 

In "geometric language" the primal -dual algorithm defines a sequence 
of successive neighboring vertices or extreme points of the convex 
polyhedron defined by the constraints of the extended primal program 
(3.2). It also defines a sequence of feasible points in the dual program 
(3.1) space which are not, in general, extreme. In fact the straight line 
joining two successive such dual feasible points [defined by (3.3)] usually 
lies in the interior of the dual convex polyhedral region (3.1), while the 
points themselves (except possibly the first) lie on some face of the 
polyhedron. In contrast, the mutual primal -dual simplex method defines a 
sequence of successive neighboring points (vertices after feasibility is 
achieved) of the convex polyhedron defined by the constraints of the 
original primal problem and visits only extreme points. Although there is 
no logical basis for comparison, intuition would seem to indicate that the 
computational advantage resides with "sticking to extreme points of the 
original problem." 

Of course, the primary interest of these methods is in their application 
to highly "degenerate" problems, for example the assignment and trans- 
portation problems. The primal -dual algorithm applied to an assignment 
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or transportation problem is the Hungarian method [4] (though it must be 
said that it was the ideas of the Hungarian method which led to the de- 
velopment of the primal -dual algorithm). Contrary to widely held 
beliefs, the Hungarian method (as described in [6]) can be described as a 
simplex method in much the same way as the primal -dual algorithm has 
been above. In fact, every operation as given in [6] has its simplex 
method counterpart. It is hoped (and expected) that the application of the 
idea of the mutual primal -dual simplex method to the assignment and 
transportation problems will lead to a new computational method which 
may better the efficiency of the Hungarian method, for in these 
problems the geometric considerations alluded to above appear to be 
important. Finally, the application of these ideas to the network flow 
algorithms, and particularly the "out of kilter" method of Fulkerson [3], 
should lead to further insight concerning the relationship between these 
specialized algorithms and simplex methods. 
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On Cone Functfons 



Edmund Eisenberg 
I. INTRODUCTION 

In what follows we shall be concerned with generalizations of the follow- 
ing "feasibility" theorem of Fan, Glicksberg, and Hoffman: 

Theorem 1: Let K be a convex subset of R n , let fj : K -* R, i - 1, , 

m, be convex functions, then one, and only one, of the following conditions 
holds: 

There is an x in K such that 

fi(x) < for all i - 1, ... m. (1) 

There exists a y = <y lf . . . , y m ) R m such that 

y yt o, yj ^ for all i, and 

yAW - for all x e K- (2} 



The concept underlying our discussion is that of "convexity with respect to 
a fixed cone C in R m " of a function F :K -* R m (K being a convex subset 
of R n ). This definition turns out to be a natural extension of the case where 
each component of F is convex in the usual sense, the last being essen- 
tially the assumption of Theorem 1. One can then generalize Theorem 1 
and its variants to a system of "cone" inequalities (Theorems 3 and 4). 

The results discussed here can be shown to hold in a more general 
framework than that imposed here, e.g., R n and R m may be replaced by 
normed linear spaces satisfying appropriate conditions , However, to be 
specific, we limit our discussion to the more restricted situation. 



tThis research has been partially supported by the Office of Naval Re- 
search under Contract Nonr-222(83) with the University of California. Re- 
production in whole or in part is permitted for any purpose of the United 
States Government. 

27 



28 MATHEMATICAL PROGRAMMING 

H. BASIC DEFINITIONS 

For each positive integer k we denote by R k the set of all real 
k-tuples x = (Xf, . . . , xfc); we sometimes write R in place of R 1 . If K is 
a subset of R k we sky K is convex providing Xx + (1 -X)x T is in K when- 
ever x and x 7 are both in K, X is in R, and ^ A, ^ 1. IfC is a nonempty 
subset of R k we say that C is a (convex) cone providing C is convex and 
Xx is in C whenever x is in C and X is non -negative real number. Hence- 
forth, we shall use "cone" and "convex cone" interchangeably. Equiva- 
lently, C is a cone providing Xx + JLIX' is in C whenever x and X T are in 
C and X and^ are both non -negative real numbers. If x and z are in R k , 
x = (xj, . . . , xfc), z = (zj, . . . , zjj) we write XZ T for the inner product of x 
and z, i.e., 



XZ T = 

1=1 

We say that x ^ z providing x{ ^ z{ for all i = 1, . . . , k. For any subset 
S of R k we define the polar of S to be: 

S* = R k fl {z | ZX T ^ for all x S> 

It is clear that S* is always a closed, convex cone. 

Whenever we use topological concepts such as "open set," "closed 
set," etc., it will always be with respect to the usual norm, i.e., 
||x|| = fcx 1 *) 1 / 2 . We shall use the following fundamental separation theorem, 
the proof of which may be found in the literature (cf. [1] or [4]). 

Theorem 2: Let K be a convex subset of R n , x a point in R n but x 
not contained in K, Then there must exist a e R n , a e R, such that a ^ 
and 



ax T 2= QJ for all x K (3) 

If, in addition, K is closed, then we may assume that axj > a. 

m, CONE FUNCTIONS 

The fact that the conclusion of Theorem 1 holds does not depend, in the 
last analysis, so much on the properties of each of the functions 
f i - - f m individually but rather on the way fj are inter -related, that is 
the relevant properties are those of the vector valued function 
F(x) = tfj(x), . . , , f m (x)). Specifically, to say that each fj is convex on K 
is equivalent to saying that: 
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for each x,z K, X R, ^ X = 1, G(x,z,X) s (4) 
where 

G(x,z,X) = F(Xx + (1 -X)z) -XF(x) - (1 -X)F(z) (5) 

One observes that (4) simply states that G(x,z,X) is in R m , the non-Degative 
orthant of R m , whenever x,z e K and X [0,1]. It is clear then that a very 
natural generalization of (4) is the requirement that G(x,z,X) always be in 
some fixed cone C in R m . For functions F satisfying that condition one 
would expect an analog of Theorem 1 to hold with "fi(x) < all i/' 
replaced by 4i F(x) Interior (C)" and "y { > 0" replaced by "y C*." 
This turns out to be the case and is formalized in Theorem 3. Accordingly, 
we have: 

Definition: Let K be a convex subset of R n . Let C be a cone in R m , 
F:K R m . Wesfcythat F is a C -function providing G(x,z,X) C when- 
ever x,z e K, X [0,1] and G is as in (5). 

Let us illustrate the preceding definition. For m = 1, the only cones C 
in R m are the origin, closed rays from the origin, and the whole line. In 
the first case a C -function is a linear function; in the second case, a 
C -function is convex or concave according to whether C is the negative or 
positive ray; in the last case, any function F : K -* R m is a C -function. In 
general, if C = R m then every function is a C -function, however Theorem 
3 is then of no interest because F(x) is in the interior of C for any x in 
R n . 

For m = 2, the class of C -functions does not provide a great amount of 
new information either, primarily because the closed cones in R 2 are 
rather simple; they are all finite cones (i.e., the sum of a finite number of 
rays). 

In general, whenever C is a finite cone in R m , that is there exists an 
mxk matrix A such that C = R m {y | yA ^ 0}, C -functions may be 
characterized quite simply as follows: let a 1 , . . . , a^ be the column vec- 
tors of A then F :K -* R m is a C -function if, and only if, each of the func- 
tions F(x)a*, . . . , F(x)a k is convex on K. Thus, if C is a finite cone, then 
questions concerning C-functions can be formulated in terms of a finite 
collection of convex functions, which may, in turn, be accomplished by 
reference to Theorem 1 or variants of it. 

In case C is not a finite cone (and thus m ^ 3) the property that F is a 
C -function can no longer be stated in terms of convexity of each of a finite 
collection of functions. In fact, a way of looking at Theorem 3 in case C is 
not a finite cone is that it represents a generalization of Theorem 1 to a 
system with infinitely many inequalities. 

IV. OPEN FEASIBILITY 

The analog of Theorem 1 for C-functions is: 

Theorem 3: Suppose C is a cone in R m with nonempty interior. Let K 
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be a convex subset of R n and F :K -* R m be a C -function. We conclude 
that one, and only one, of the following statements holds: 

There is an x such that 

x K and F(x) e Interior (C) (6) 

There exists a y C* such that 

y * and F(x)y T ^ for all x e K (7) 

Before proving Theorem 3 we require two simple preliminary results: 

Lemma 1: Suppose C is a convex cone in R m , y Interior (C), z C*, 
z ^ 0. Conclusion: yz T < 0. 

Proof: Suppose y,z are as above and yz T ^ 0. Since y e Interior (C), 
there exists a 5 > such that w C whenever ||w - y|| ^6. Let w be any 
such vector, then v = 2y - w is also in C because ||v -y|| = ||y - w|| < 6. 
Now y = \ (v + w), v,w C and z C*; thus ^ yz T = J(vz T + WZ T ) < 
and WZ T = 0. We have just demonstrated that WZ T = for all w in some 
neighborhood of y, contradicting z * 0. 

Lemma 2; Let C be a convex cone in R m , y Interior (C), z C, 
X R, X > 0. Conclusion: y + z Interior (C) and Xy Interior (C). 

Proof: Since y is in the interior of C we know there exists a 6 > 
such that u C whenever ||u -y|| ^ 6; however if |[ (y + z) - w|| ^6 then 
||y - (w - z)|| < 6 and w - z C. Butthen w = (w-z)+zeC because C is 
a cone. Thus y + z is in the interior of C. For the statement: 
Xy Interior (C) we have the following sequence of implications: 
||Xy - w|| ^ X6 => ||y -X^wll < 6 => X^w e C => w e C, and thus Xy is 
in the interior of C. 

Proof of Theorem 3: The proof that (6) and (7) cannot hold simulta- 
neously follows from Lemma 1 because if (6) and (7) both hold then y e C*, 
x K, Ffx) Interior (C) and from Lemma 1 we have F(x)y T < 0, contra- 
dicting (7). To show that either (6) or (7) holds, let us assume that (6) is 
false and consider the set: 

Y = {y | there exist x K and y Int(C) with y = y - F(x)} 

The fact that (6) is false is equivalent to saying that y = is not a member 
of Y. We intend to show that Y is convex, then apply Theorem 2 to Y, 
knowing it does not contain the origin, and thus obtain a y satisfying (7). 
Suppose we have yi ,y x Y, i.e., there exist y lf yj e Int(C), and Xl ,x 2 K 
with yfc = Jfc - Ffc 2 ), k = 1, 2. Now if X (0, 1) we wish to show 
y - Ay t + a -X)y Y. Let u =Xy t + (1 -X)y 2 , v = Ffcxj + (1 -X)x 2 ) 
XFfc t ) - a -X)Ffc 2 ) then y = u + v - FvXXj + a ~X)x 2 ). However, 
?! e Interior <C) and (1 -X)y 2 C thus, by Lemma 2, u Int(C). Also, 
because F is a C-function, v e C and thus by Lemma 1 we have u + v 
Interior (C), showing that Y is convex. 

Next, we know that Y is convex and y = is not an element of Y 
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Applying Theorem 2 we know that there exists an a R m , a * 0, a R 
such that: 

^a 

at? - F(x)] T ^ of all y Int(C), x e K (8) 

From Lemma 2 we know that Xy Interior (C) whenever X > and 
y c Interior (C), thus we may infer from (8) that: 

ya T < all y Int(C) (9) 

Now, by assumption there is a y Interior (C), so that if y C, X ^ 
then by Lemma 2 Xy + y Interior (C). We see then, using (9), that 

Xya T + y^a T <Q all X & 0, y C 

Thus, ya T ^ whenever y C and consequently a C*. Furthermore, 
for each X > we have by Lemma 2 Xy Interior (C), thus from {8} we 
have 

Xy0a T - F(x)a T < a ^ all X > 0, x K 

whence it follows that Ffc)a T ^ for all x K and of course a * 0; thus 
y-= a satisfies all the conditions in (7). 

As an immediate corollary of Theorem 3 one obtains Theorem 1 by 
letting C be the nonpositive orthant and F(x) = (^(x), . . . , f m W). 



V. CLOSED FEASIBILITY 

The statement of Theorem 1, and similarly of Theorem 3, is in many 
respects inadequate. Frequently, one encounters situations where it is re- 
quired, within the framework of Theorem 1 as an example, to find an 
x K with fj(x) < for all i, rather than the strict inequalities of (1). It 
would seem reasonable, given that K is closed, to replace (1) by weak 
inequalities and correspondingly replace (2) by a statement of the form: 
"there is a y = (y t , . . . , yn) R m such that y ^ and s[2i yifi&O > 
for all x K," and expect a true statement. This turns out to be indeed 
the case when K is a linear variety and all the fj are linear. In general, 
with the fi being convex, it may happen that there is no x K such that 
fi(x) < for all i, and yet there is no y R m with Siyi^M > for all 
x K. This is illustrated by: m = 2, n = 2, K = Rj, fi^Xg) = x 2 - 1, 
f 2 (x 1 ,x 2 ) = 1 -X 1 x 2 (x 1 +x 2 )^(f 2 (0) = 0). It is readily checked that f t and f 2 
are convex; furthermore if f t (x) ^ and x in K then either x^Cxj * x 2 )" 1 
< x 2 ^ 1 or x 2 = 0, in each case we have f 2 (x) > 0. However, if there exist 
y ls y 2 ^ such that yt^Cx) +y 2 f 2 (x) > for all x in K, i.e., 
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1) + y 2 Fl - > a11 x l>*2 * 



then, letting x 2 = 0, x^ = 1, we get y 2 > y^ On the other hand, letting x i 
become arbitrarily large, we must have: yj(x 2 -1) + y 2 (l -x 2 ) > for all 
x 2 ^ 0, and thus y^ = y 2 , a contradiction. 

It thus follows, in particular, that the rephrasing of Theorem 3 in terms 
of weak inequalities, i.e., replacing "Interior (C)" in (6) and "F(x)y T 
> 0" by "F(x)y T > 0" in (7), need not always hold. We can show, however, 
that under certain regularity assumptions on C, K and F the statement in 
question does hold: 

Theorem 4: Let C, K and F be as in Theorem 3, let 

H = {y | there exist x in K and y in C with y = y - F(x)} 

then H is a convex set. Furthermore, if H is also closed and the 
statement: 

There is an x such that 

x is in K and F(x) is in C (10) 

is false, then the statement: 

There is a y e C* such that 
F(x)y T > for all x in K 

is a true statement. 

Proof: As in the proof of Theorem 3, the convexity of H is a direct 
consequence of the assumptions on C, K and F. The fact that (10) is 
false is equivalent to saying that y = is not in H, thus if H is closed 
and convex then, from Theorem 2, it follows that there must exist an 
a e R m and a R such that: 



ay T < a < ay T = 0, all y H (12) 



i.e., 



ay T - F(x)a T < a < 0, all y e C, x K 



(13) 



However, c C and also Xy C whenever y e C and \ ^ and thus we 
infer from (13) that: 

ay T ^0 all y c C 

~ F(x)a T < a < all x K (14) 
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and (14) states that the vector a is precisely the one required for (11). 
Q.E.D. 

Note: We have stated Theorem 4 in slightly different form than 
Theorem 3; however it is quite obvious that (10) and (11) cannot be simul- 
taneously true. 

It is worth noting that the condition that H be closed though certainly 
sufficient in order that (11) hold in case (10) is false, is by no means nec- 
essary. For one thing, as is clear from the proof of Theorem 4, we can 
actually get a positive lower bound for F(x)y T on K (namely the number 
a) when H is closed; however, consider the case m = n = 1, K = R+, 
C = R_ and F(x) = (1 + X)" 1 . Then F is a C -function (because F is convex) 
but x e K, F(x) C has no solution because F(x) > for all x K. Now 
C* = R+ and in fact any y C* will satisfy (11). Nevertheless, no matter 
what y C* we take, F(x)y T has no positive lower bound as x ranges over 
K (by letting x become arbitrarily large we can make F(x) arbitrarily 
close to zero). Thus, in this case, H is not closed. Other situations where 
the closedness of H is not necessary arise when K itself is not closed. 

In a future note it is intended to relate to cone functions the following 
characterization of diffe rentable convex functions: Suppose K is an open 
convex set in R n , f : K * R has all second partial derivatives then f is 
convex if, and only if, the quadratic form [fjj(x)] is positive -semi -Kief inite 
for each x in K. 
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THE MAXIMUM TRANSFORM 

R. E. Bellman and W. Karusk 

ABSTRACT 

k 6 * tne niaximum (additive) convolution of two functions f and g_ on n 
nonnegative real variables x = (xj, x 2 , . . . , x n ) be defined by (f g)(x) 

= max [f(u> + g(v)], u ^ Q, v : 0, x ^0. This operation arises in optimiza- 
u- L v=x 

tion problems and prompts the consideration of transformations Tf having 
the "disassociative" property (*)T{f g) = Tf + Tg. 

Let the maximum transform Mf - <p be defined by <p(|) = sup [ (,x) 

x>0 

+ f(x>], where <|,x) = 2 ^Xj. Then (1) M has property (*); (2) Mf = Mf, 

i= l 
where f is the concave increasing "cap" of f; (3) if Mf = <p, then M^ = F. 

[For closely related results, see W. Fenchel, " Convex Cones, Sets, and 
Functions," Lecture notes (1953), Department of Mathematics, Princeton 
University; there, <p is called the conjugate of f, for f concave.] Let T be 
an arbitrary transformation with property (*). Then (i) Tf = Tf; (ii) T 
= A.M for some transformation A. such that A.(<pj + (p 2 ) = \tp l + A.<^ 2 . This 
shows that M is a "best" transformation T. Similar results apply to the 

product convolution (F G)(x) = max [F(u) x G(v)]. 

u--v=x 

The paper also includes a discussion of applications and of certain ex- 
tensions of the theory. 
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REPRESENTATIONS FOR THE GENERALIZED INVERSE" 
OF MATRICES PARTITIONED AS A - [U,V] 

R. E. Cline 

ABSTRACT 

The generalized inverse of a matrix has been used by L. D. Pyle in the 
development of an "interior gradient projection method" for solving linear 
programming problems. It is of interest, therefore, to have techniques for 
determining the generalized inverse of a matrix. 

Let A designate any m by n matrix with complex elements. Partition 
A arbitrarily as A = [U,V] where the submatrices U and V have k and 
n - k columns, respectively, 1< k = n - 1. Designating the generalized 
inverse of any matrix, X, as X* and the conjugate transpose as X* then 
Theorem 1 

Let C = (I - UIT)V 
and 

K = p + (i - c+c)v*u 4 *trva - c+or 1 

Then 

[u* - irvc* - irv(i - c + c)Kv*ir*u*a - vc+) 
c + + a - e + QKv*ir*u + a - 

Proceeding in the opposite direction, suppose that A = [U,V] and that A + 
is known. Partition A + as A* = where G has the dimensions of U* 



and H has the dimensions of V*. Then 
Theorem 2 

U* = G[I + V(I - HV)*H]{I - [H - (I - HV) 

x a - HV)*Hr[H - (I - HV)(I - HV)*H]} 
V* - H[I - U(I - GU)*G]{I - [G - (I ~ GU) 

x (I - GUTG] + [G - (I - GUMI - GUTG]} 

Using Theorems 1 and 2 and various corollaries derived therefrom, it 
is possible to combine the interior gradient projection method and the 
simplex method into a technique for solving linear programming problems. 
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On Ummoc/ular Sefs of Vectors 



Isidore Heller 

The definition of the concept of a unimodular set (Section 2) is followed 
by a discussion of some of its properties {Sections 3-5) and a study of 
inter-relations between classes of maximal unimodular sets (Sections 6-9). 
The main result is a theorem on the union of two unimodular sets 
{Section 9). The paper constitutes an introductory presentation on the 
subject. 



1. INTKODUCTION 

In order to relate the definition to some familiar concept we recall that 
given a group G of transformations on a set (say a vectorspace) V, and 
given a subset S of V, it is frequently of interest to study the structure of 
the subgroup H of G which is defined as the set of all those transforma- 
tions that leave S invariant, that is 

H = {TG|T(S)CS} (1.1) 

where T(S) is a short notation for the image of S under T. Conversely, 
given a subgroup H of G one may be interested in those subsets S of V 
for which H is the associated subgroup defined by (1.1). 

It is a slight generalization of (1.1) if, instead of the subgroup H, we 
consider the subset H T of all those elements of G which map a specified 
subset S of S into S, that is 

H' ={TG|T(S )CS} (1.2) 

where H T will in general no longer be a group. 

Conversely, having specified an S for each S of V, and given a sub- 
group H of G, one may ask: for which subsets S of V is the [by definition 
(1.2)] associated H? contained in H. This question, for a special choice of 
S , G and H, leads to the definition of unimodular sets. 



tThis work was sponsored in part by the RAND Corporation and in part 
by the National Science Foundation, 
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2. DEFINITION 

For the purpose of this paper V is assumed to be a vector space of 
finite dimension over the field of real numbers. The G denotes the group 
of nonsingular linear transformations on V, U the unimodular subgroup of 
G, that is, the group of transformations whose determinants have absolute 
value 1. Without further loss of generality, the consideration is restricted 
to those subsets of V which are not contained in a proper subspace of V; 
then each such subset S contains some basis B of V. 

Definition: InV,a subset S containing some basis B of V is unimodular 
iff every nonsingular linear transformation that maps B into S is a uni- 
modular transformation. Or, briefly: 

S is unimodular iff, for some basis B C S, 

{T G|T(B)CS}CU. (2.1) 

This definition does not depend on the particular choice of B in S. 
That is, if B and C are two bases in S, 

Hi ={T G| T(B) C S}, H 2 ={T G| T(C) C S} 

and H t C U, then H 2 C U. To see this, let T(C) C S and T^B) = C. Then 
T! U, and from T(C) = TT t (B) C S it follows that TT t U. Hence, T 
= TT t Tf U. This proves the asserted independence from the choice of 
basis, and thereby the equivalence of definition (2.1) to the formally more 
restrictive definition: 

S is unimodular iff {T e G| T(B) C S for some basis 

BCS} CU (2.2) 

or, in other words: 

S is unimodular iff every two bases in S are related by a 
unimodular transformation. (2.3) 



3. PROPERTIES 

The motivation for the study of the concept will become clear from its 
properties. We shall discuss a few which are characteristic, that is, con- 
stitute necessary and sufficient conditions for a set S to be unimodular 
and can therefore serve as alternative definitions. 

Let B = {bj, ba, . . . , bn} be a basis in S, and 

d = E x i b i (3.1) 

be the representation of a vector d of S in terms of B. 
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If Xj * 0, replacement of bj in B by d yields a new basis D in S. If 
T is a linear transformation mapping B onto D, the matrix f of T with 
respect to the basis B is a permutation matrix except for the column of 
the j th unit vector which is replaced by the sequence of the A.. Therefore, 
the determinant of T and hence of T equals \j. 

If S is unimodular, then T is a unimodular transformation, and hence 
Aj must equal 1. Since Xj was assumed * but otherwise arbitrary, we 
have for unimodular S the property that in (3.1) \i {0, 1, -l} for 
i = 1, 2, . . . , n and for every pair d, B in S. 

Conversely, if a set S has that property, a linear transformation T 
satisfying T(B) = D will certainly have determinant 1 and hence be uni- 
modular whenever the two bases B and D of S coincide in all but one 
vector (and trivially so when they coincide in all). If B and D have less 
or no vectors in common, it is readily seen that appropriate successive 
replacement of vectors in B by vectors of D leads to a sequence of bases 
B = BO, Bj, B 2 , . . . , Bjt = D, where every two consecutive ones differ in 
only one vector. Hence, T can be represented as a product of unimodular 
transformations T = Tfc . - - T 2 T t , where T^B^) = BI (i - 1, 2, . . . f k). 
Therefore, T is unimodular. 

The two statements combine to this: 

S is unimodular iff, for every basis B C S, and every d S, 

the coordinates of d with respect to B equal 0, 1, or -1. (3.2) 

Considering a fixed basis in S, it becomes immediate from (3,2) that a 
unimodular set in V n contains at most 3 n vectors. It should also be noted 
that weakening the conditions of (3.2) to read "some" instead of "every" 
in either one of the two occurrences would render them insufficient. How- 
ever, the following weaker form still holds: 

S is unimodular iff, for every basis B C S and every d S, 

the coordinates of d with respect to B are integers. (3.3) 

To see that the condition of (3.3) implies the condition of (3.2), assume 
that the latter is not satisfied; that is, that for some basis B and element 
d of S, the representation (3.1) contains some Xj * 0, 1, -1. But then re- 
placement of bj in B by d yields a new basis in which b[ has the 
representation 



where the coefficient of d is not an integer. Hence, the condition of (3.3) 
is not satisfied either. 

For later reference we mention some other formulations of (3.3). If 
J(S) denotes the integral span of S, that is, the set of all linear combinations 
of elements of S with integral coefficients, then (3.3) reads 
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S is uni modular iff, for every basis B of S, S C J(B). (3.4) 

Further, since J(B) D S<^> J(B) D J(S), and since B C S implies J(B) 
C J(S), we have 

S isunimodulariff, for every basis B in S, J(B) = J(S). (3.5) 

Each of the last two statements can be taken as the definition of unimodular 
sets S in a free abelian group of rank n, if B denotes a set of n linearly 
independent elements in S (and hence in general not necessarily a basis for 
the entire group). 

If S is in Euclidean R n , it is convenient to interpret a basis B, in some 
arrangement of its vectors, jis matrix, B, and to represent a linear trans- 
formation T by its matrix T. Then the relation T(B) = D implies 
T B = D P where P is an adequately chosen permutation matrix. Since the 
determinant |P| =1, T isunimodulariff |BJ = |D| . Hence, by (2.3), 

S isunimodulariff the determinants associated with bases in 

S have the same absolute value. (3.6) 

In R n the set S itself can, of course, for some arrangement of its 
vectors as columns, be interpreted as matrix. However, care must be 
taken not to confuse the concept of a unimodular set with the (classical) 
concept of a unimodular transformation. To enforce the necessary distinc- 
tion in cases where the set S takes visually the form of a matrix A, we 
shall have to specify whether we are concerned with A as representative 
of a set of vectors or as matrix of a linear transformation, whenever this 
is not clear from the context. In order to avoid too lengthy terminology, 
we propose that 

In conjunction with a matrix, "set" shall mean "set of 

columns" (3.7) 

so that when D is a matrix, expressions like "the set D is unimodular," 
"D is a unimodular set," will have a clear meaning. 

To obtain still another property of unimodular sets, we consider the 
system of linear equations 

Ax^b (3.8) 

where A is a matrix of n rows and of rank n, and hence the number of 
columns k ^ n. 

Solving (3.8) can be interpreted as finding a representation of b in 
terms of the set A. If the columns of A entering a particular representa- 
tion with nonzero coefficients are linearly independent, then b appears 
represented in terms of some basis B of A, and the solution is called 
"basic." 
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The existence of an integral solution is equivalent to b J(A). If the 
set A is unimodular and b J(A), then (3.5) asserts b J(B) for every 
basis B in A, that is, all basic solutions are integral. Conversely, if for 
every b J(A) all basic solutions are integral, that is, b J(B) for every 
B in A, then J(A) = J(B) for every B in A, and, by (3.5), A is a unimodu- 
lar set. Hence, we have 

The set A is unimodular iff the system Ax = b has the property 
that all its basic solutions are integral whenever b is such that 
an integral solution exists, that is, for every b J(A). (3.9) 

Properties (3.2) and (3.9) suggest the role of the concept in applications. 
Heuristically, the concept actually arose from these properties; (3.9) has 
to do with the existence of integral -valued solutions to systems of linear 
equations, matrix games and linear programs; (3.2) relates to the possi- 
bility of simplified computational algorithms for the solution of such 
systems: it states that the unimodular property amounts to the absence of 
division in those iterative computational algorithms where each step in- 
volves transition from one basis to another by the exchange of a vector in 
the basis. 



4. GENERAL DEFINITION 

The definition (2.1) was restricted to those sets S in V n which contain 
a basis of V n . If S in V n is of dimension k < n, the restriction in (2.1) 
is circumvented by considering S in its linear span L(S) which is a Vfc. 
Then the definition (2.1) reads: 

In V n a set S of dimension k ^ n is unimodular iff, for a given 
subset Bfc of k linearly independent vectors of S, every 
nonsingular linear transformation on the subspace V^ = L(S), 
which maps B^ into S is a unimodular transformation on V^. (4.1) 

This definition obviously includes (2.1) as the special case k = n. 

Of the statements in Sections 2 and 3, all but one remain true for k < n 
if "transformation" is taken to mean "transformation on L(S)," [hence, 
in particular, U is taken to denote the group of unimodular transformations 
on L(S)], and "basis" is taken to mean "basis of L(S)," that is, a maximal 
set of linearly independent vectors in S. The excepted statement is (3.6), 
which, for the general case k ^ n, will receive a more direct formulation 
in (4.3) and (4.4) below. For this purpose, we first note: 

If a linear transformation T on V n preserves the dimension of 

S, then S is unimodular iff T{5) is unimodular. (4.2) 

Proof. The restriction of T to Vfc = L(S), as isomorphism between Vk 
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and Wfc = L(T(S)), preserves linear relations both ways. Hence, (3.2) 
implies (4.2). 

If now A is a matrix of rank k, A& = {a^, a,^, . . . , a^ } is a set of k 

independent rows of A and T is the orthogonal projection on the linear 
span of the set of unitvectors {ej | aj e A^}, then (4.2) yields 

If A is of rank k and Afc is a submatrix consisting of k linearly 
independent rows of A, then A is a unimodularsetiff A& is a 
unimodular set. (4.3) 

Application of (3.6) to Afc of (4.3) yields 

If A is of rank k, then the set A is unimodular iff in each set 
Afc of k rows in A, the nonvanishing minors of order k have 
the same absolute value. (4.4) 

It should be noted that this value may vary with the choice of A^. Applica- 
tion of (4.4) to A and its transpose A T suggests: 

If A is of rank k, then the sets A and A T are both unimodular 

iff all nonvanishing minors of order k have the same absolute 

value. (4.5) 

Proof. To see that the condition is necessary, let M t , M2 be nonsingular 
minors of order k, Rj and C i the set of rows and the set of columns of A 
which contain M t . Then the minor P common to Rj and C% is nonsingular, 
since the k by k coefficient matrix D representing C^ in terms of C 2 , 
that is, satisfying C 2 D = C t , also satisfies PD = M 4 . If the sets A and A T 
are both unimodular, then by (4.4) ||P|| =||M 1 |1 and ||P|| =||M 2 ||. 

Of methodological interest for subsequent investigations is 

If S is unimodular, every subset of S is unimodular. (4.6) 

Proof. Let S be of dimension k, R a subset of dimension r ^ k, and 
d e R represented in terms of a basis B in R by d = A.^ + \ 2 l>2 + + X r b r . 
Obviously d has the same representation in a basis of S which contains B. 
If S is unimodular, the \i satisfy (3.2) and hence R is unimodular. 

As a statement on matrices, (4.6), in conjunction with (4.3) and (4.4), 
reads: Let A be a matrix of rank k and let C, a submatrix consisting of 
columns of A, have rank r. If the nonvanishing minors of order k in some 
fixed set Afc of k linearly independent rows of A have the same absolute 
value, then, in each set C r of r rows of C, the nonvanishing minors of 
order r have the same absolute value. It should be noted that the conclu- 
sion is not generally true for A r instead of C r . Hence, A r need not be a 
unimodular set when the set A is unimodular. 
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5. TOTALLY UNIMODULAR SETS 

Let S be a set of dimension n in V n +k- If n linearly independent 
vectors in S are chosen as basis for a coordinate system in L(S), then S 
appears represented by a set S in R n and S contains the unit vectors of 
R n . Since the transition from L<S) = V n to R n is an isomorphism, it is 
obvious that S is unimodular iff S is unimodular. Hence, the study of uni- 
modular sets reduces to the study of those unimodular sets in R n which 
contain the unit vectors of R n (and hence, in particular, are of dimension 
n). Such a set, viewed as matrix (for some arrangement of its vectors), is 
of the form A = [I n | D] where I n denotes the identity matrix of order n. 

A = [I n | D] is a unimodular set iff every nonvanishing minor 

in D (in A) has absolute value 1. (5.1) 

Proof, Sufficiency is obvious. Necessity follows from (3.6) and the ob- 
servation that the k columns of A which contain a given minor of order k 
can be completed to a basis by columns of I n . 

Originally the connotation ik unimodular " was used by A. J. Hoffman and 
J. B. Kruskal [2] to characterize sets satisfying the condition of (5.1), In 
the context of our definition (2.1) it now becomes desirable to distinguish 
the special character of these sets. Following a suggestion of C. Berge 
and A. J. Hoffman, we define 

In R n , the set A is totalty unimodular iff every nonvanishing 

minor in A has absolute value 1. (5.2) 

The following two statements are obvious: 

If A is a matrix and A T its transpose, then the set A' is totally 
unimodular iff the set A is totally unimodular. (5.3) 

In R* 1 the set A is unimodular iff there exists a nonsingular 

linear transformation T such that T(A) is a totally 

unimodular set. (5.4) 



6. CLASSES OF MAXIMAL UNIMODULAR SETS 

For brevity of exposition we return again to the practice adopted in 
Section 2: for a given V n consideration shall be restricted to sets of 
dimension n. 

Since subsets of a unimodular set are unimodular, the question as to 
which sets are unimodular reduces to the question as to which are the 
maximal unimodular sets. Clearly, a maximal unimodular set contains the 
nullvector and with each vector also its negative. 

Further, since nonsingular linear transformations preserve the uni- 
modular and maximality property of a set, two sets related by such trans- 
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formation can be considered as equivalent, thus leading, for each dimen- 
sion n, to a collection of equivalence classes of maximal unimodular sets. 

For a given dimension the number of distinct classes is finite, as is 
immediate from (3.2). 

One particular class, which we shall term "Class I," is well known. A 
member of Class I is, if we disregard the nullvector, a set D of the form 

D = {ai -aj} (i * j; i,j = 1, 2, . . . , n+1) (6.1) 

where the aj are n+1 affine independent vectors, that is 



M = =>x i = ft = 1> 2, . . . , n+1) (6.2) 

i=l 1=1 

Geometrically D can be interpreted as the set of edges of an n-simplex 
in affine space (each edge taken in both orientations). 

In R n a member D of Class I which contains the unit vectors can be 
viewed graphtheoretically as the set of paths in an oriented tree of n+1 
vertices, if a path is represented by an incidence column which charac- 
terizes the edges of the tree that are contained in the path. 

Major facts on Class I are the following : 

(i) A unimodular set of dimension n belongs to Class I iff it 
consists of n(n+l) vectors (not counting the nullvector). 

(ii) A maximal unimodular set of dimension n which is not in Class 
I consists of less than n(n+l) vectors; such sets exist iff 
ns=4. (6.3) 

For proofs and further details see [3] and [4]. For another graph- 
theoretical interpretation and related facts see [2], [5] and [6]; for related 
concepts [7]. 

While the preceding sections dealt with properties of an individual uni- 
modular set, the consideration now turns to properties that inter-relate 
distinct classes of unimodular sets; it is the objective of the remaining 
sections to obtain a fundamental property of this nature, namely the theorem 
of Section 9. This is achieved by a series of auxiliary statements concern- 
ing the representation of vectors in terms of certain bases (Section 7), the 
structure of the intersection of two unimodular sets (Section 8) and finally 
a comparison of classes of unimodular sets (Section 9). 



7. REPRESENTATION 
Notation, 

When A fl B = 0, we shall write A + B instead of A U B, and when 
A C B, we will write B - A for the complement of A in B. Conversely, 
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the use of this notation shall mean that the assumptions are satisfied. 

Further, for a set consisting of a single element x, we shall write x 
instead of {x} whenever the meaning is clear from the context. Thus, if 
B = {bj, bj, . . . , bn}, the notation A = B - ty * a shall mean 
A = {a, b 2 , . . . , bjj}: 

Finally, if B is a basis and x a vector in V n , R(x,B) denotes the set 
of vectors in B that enter, with nonzero coefficients, the representation of 
x in terms of B. That is, if x = ZA^, then R(x,B) = {bf | Xf * 0}. 

For later reference we first translate a few trivial facts into this 
notation. 

If B and C are bases, then R(x,B) = R(x,C)=>R(x,B) C C (7.1) 

If B is a basis, and C = B - b + x, then C is a basis iff 

b R(x,B) (7.2) 

If B is a basis, b e R(x,B) fl R(y,B), and C - B - b + x, 
then C is a basis and 

(i) R(y,C) D R(x,B) U R(y,B) - R(x,B) f\ R(y,B) + x 

(ii) R(y,C) C R(x,B) U R(y,B) - b + x 

(iii) R(y,C) n [B - R(y,B)] = R(x,B) [B - R(y,B)] 

(iv) R(y,C) n [B -R(x,B)] =R(y,B) [B ~R(x,B)] (7.3) 

In (7.3) the last two relations are immediate consequences of the first 
two. We shall have use for the following form of (iii) and (iv): 

If B is a basis, B + C B, b Rfx,B) fl R(y,B), b e T B* and 
C = B - b + x, then C is a basis such that B* C C and 

Ci) R(x,B) B + = 0=e> R(y,C) B* = R(y,B) fl B* 

(ii) R(y,B) fl B* = =^> R(y,C) n B* = R(x,B) R B*. (7.4) 

The following statement concerns representations within a unimodular 
set. 

Let B be a basis in S, x S, y S, and 

R(x,B) n R(y,B) = B # 0, 
so that in the representations 



the linear combinations L t and LS have all nonzero coefficients. If S is 
unimodular, then 
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L 3 (B)=L 1 (B ). (7.5) 

Proof. By (3.2) all nonzero coefficients are 1. Hence, (7.5) can be 
violated only when B contains at least two vectors, say b 4 , b 2 , and 



where j = 1. But then in the basis C = B -b t + x, the representation of 
y would be 

y = x - 

where 02 has the coefficient 2, in contradiction to (3.2). Hence, (7.5) 
must hold. 

We note that for unimodular sets (7.5) implies a sharper form of (7.3), 
namely, 

If S is unimodular, B a basis in S, x S, y S, and 

b R(x,B) R(y,B), then, for the basis C = B -b + x, 

R(y,C) =R(x,B) U R(y,B) -R(x,B) fl R(y,B) +x. (7.6) 



8. INTERSECTION OF TWO UNIMODULAR SETS 

Assumptions and notation. 

Throughout this section we are concerned with two sets, F and G, such 
that 

(i) both are unimodular 

(ii) each contains with x also -x 

(iii) their intersection F fl G = D contains a common basis B 

(iv) there exist f e F and g e G such that f * g and 

R(f,B)=R(g,B)=B. (8.0) 

Without loss of generality, we may then assume 
f = bj + b2 + - - + b k - (b k+ j + b k+ 2 + - - 



g = D! -f b 2 + - + b k + b k+ i + - - - + b k+s 
for which we symbolically write 

f =2B*-SB- 

g = SB* + SB" = 2B. (8.1) 
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Obviously 

f F -D, g G -D (8.2) 

since otherwise f and g would both belong to the same unimodular set, 
and (8.1) would contradict (7.5). 
We denote by 

<B = {B I B is a basis and B C B C D}. (8.3) 

For the intersection D the given assumptions imply a special structure 
which will emerge from the study of representations in the remainder of 
this section. 



x c D, B e (B, R(x,B) fl B + * Q=> R(x,B) B" = 0. (8.4) 

Proof. In the representation of x consider the nonzero coefficients of 
vectors in B* and B" in conjunction with (7.5) and (8.1); since x F, those 
coefficients must be of opposite sign; since x G, they must be of the 
same sign. Hence, R(x,B) cannot intersect both B" 1 " and B". 

x D, y e D, B <B , R(x,B) R B* * 0, R(x,B) f] R(y,B) * 

=> R(y,B) B- = 0. (8.5) 

Proof. Assume R(y,B) f] B~ * 0. Then, by (8.4), R(x,B) fl B~ = and 
R(y,B) n B + = 0, hence R(x,B) fl R(y,B) fl B = 0. If now b R(x,B) fl R(y,B), 
then b T B, hence C=B-b+xe<B. Then (7.6) implies R(y,C) fl B 
D [R(x,B) U R(y,B)] fl B, hence R(y,C) intersects B* as well as B~, in 
contradiction to (8.4). 

If we define 

D+ = {x D| R(x,B) n B* 96 o for some B } 
D" = {x e D| R(x,B) n B~ * for some B <B} 
D' = D - (D+ U D-) (8.6) 

then obviously D*D B + and D"D B". Furthermore, 

D*, D", D T are pairwise disjoint. (8.7) 

Proof. By its definition, D' is disjoint from D* and D". To show that 
D* and D" are disjoint, assume x D + fl D~. Then, for some A e (B and 
some C (B, we will have R(x,A) n B* # 0, R(x,C) n B" * 0. Let 
H = B U R(x,A) U K be a maximal independent set in B U R(x,A) U R(x,C), 
and B an extension of H to a basis in D, Then, noting that R(x,B) = R(x,A), 
we have H C B e (B, R(x,B) fi B + ^ 0. Note that B f> R(x,C) since other- 
wise uniqueness of representation would imply R(x,C) = R(x,B) and hence 
by (8.4), R(x,C) n B" = 0. 
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Let y R(x,C), y e T B. The assumption on H implies R(y,B) c H The 
R(x,B) n R(y,B) * 0, since otherwise y C and R(y,B) C B U K c C wouM 
imply R(y,B) = R(y,C) =y B, contradicting y T B. Hence, by (8.5) 
R(y,B) fl B- = 0. Therefore, if we pass from the representation of x' in 
terms of C, x= L(R(x,C)), to the representation of x in terms of B 
x = ^(R^B)), by means of substituting for each y in R(x,C) which is not 
in B its expression y = L 2 (R(y,B)), then these substitutions will not affect 
elements in B~, and hence lead to R(x,B) fl B- = R(x,C) fl B~ * in contra 
diction to (8.4). This completes the proof of (8.7). 

x D, y D, B <B, R( X ,B) (] B + * 0, R(x,B) f| R(y,B) * 

=>y D+. 

(8.8) 

Proof. R(y, B ) n B+ * implies y e D+ by definition of D* R(y B ) 
fl B - in conjunction with (8.5) implies R(y,B) B = Hence 'if 

- > n B- 



As a special case of (8.8), we note 

x e D, B e (B, R( X(B ) n B* * => [x U Rfr.B)] c D + . ( 8 . 8a) 

We are now prepared to prove the essential statement of this section: 

(i) x e D*=*R( X) B) C D* for all B a 

(ii) x e D-=s>Rfc,B) c D- for all B e 03 , M 

(o.y) 

> n B - - for all B e (B 



H = {A CB|R( X ,A) f]B + * 0}. 

Then, for B H, (8.8a) implies R( X ,B) C D + 
Now assume B f H and R(x,B) ^ D + . Then 



B) = E + E + , * E C D -D + , E + C D + - B + . (2) 

Since x D+, H is not empty. Let C H be such that 

B H C is maximal in {B n A | A H}. 
We first show 

CDE and R(x,C) P E =0 

(3) 

E ' y ? * The first 
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contradicting y T C, Then in particular c * B, hence C-c+y = C*<B. 
Furthermore, R(x,C*) fl B + = R(x,C) fl B* * 0; this is obvious when 
c e' R(x,C), and in the other case it follows from c R(x,C) fl R{y,C), 
R(y,C) fl B + = and (7.4i). Hence, C* H. However, the relation B ft C* 
= B C + y contradicts the assumption that B fl C was maximal, since 
y T C implies y f B fl C. This proves C D E. 

The assertion R(x,C) fl E = is a special case of R(x,C) C D* which 
follows from C H and (8.8a). This completes the verification of (3). 

We have thus 

R(x,C)=Q + Q + , QCC-E-B , * Q* C B+. (4) 

Denoting by B the set of those elements of R(x,B) which are not in C, 
it follows from (2) and C D E that 

B = R(x,B) - C n R(x,B) C E*: (5) 

Given the representation of x in terms of B, we can substitute for each 
y e B its representation in terms of C. This substitution should yield the 
(unique) representation of x in terms of C. Since R(x,B) D E while 
R(x,C) fl E = 0, we must have R(y,C) fl E * for some y B. However, 
this is not possible as we will show in the following two steps: 

y B => R(y,C) D Q* (6) 

y e B => R(y,C) fl E = (7) 

Proof of (6). Assume q 6 Q*, q *' R(y,C). Let c R(y,C), c e T B; such 
c exists since otherwise R(y,C) = R(y,B) = y and y would be in C. Then 
C-c+y=C*(B. Further, q R(x,C) implies q R<x,C*); this is ob- 
vious when c f R(x,C), and in the other case it follows from q f R(y,C). 
Therefore C* H. However, BfiC*=BfiC+y contradicts the assump- 
tion that B n C was maximal, since y f C implies y f B C. This proves 
(6). 

Proof of (7). For y B, (6) implies R(y,C) n B* * 0. Hence, by (8.8a), 
R(y,C) C D + . Then, by definition of E in (2), R(y,C) D E = 0. This proves 
(7) and hence completes the proof of (8.9i). 

Proof of (8.911). By assumption (ii) of (8.0), -f F. Substitution of -f 
for f in (8.1) interchanges B*and B" and subsequently D* and D". Then 
(8.91) implies (8.9ii). This completes the proof of (8.9). 

x D f =>R(x,B) C If for all B e (B. (8.10) 

Proof. If, for some x e D 1 and B <B, R(x,B) jt D* f let b R(x,B) such 
that b D - D 1 = D* + D". If b B = B* + B" , then definition (8.6) implies 
x e' D T , thus contradicting x D 1 . If b f B, then B-b+x=C<B and 
x R(b,C). Hence, R(b,C) fl D T * 0, which by (8.7) and (8.9) implies 
b T (D* + D"), contradicting our assumption on b. This proves (8.10). 
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The statements (8.9) and (8.10) combined achieve the aim of this section 
in the following assertion on the structure of the set D: 
Let V(S) denote the linear span of the set S. 

Under the assumptions (8.0) and definition (8.6) the three subs paces 
V* = V(D*), V" = VffO, V f = V(D T ) are pairwise disjoint. (3.11) 

Proof. For an arbitrary but fixed B CB, let Q + = B ft D + , Q" = B fl D~, 
Q f = B n D f . By (8.7), B = Q* + Q~ + Q f ; then, by (8.9) and (8.10): V* = V(Q + ) 
V T = 



9. COMPARISON OF CLASSES 

A comparison of two classes of maximal unimodular sets of the same 
dimension is effectuated by comparing a pair of representatives, F and G. 
To this end, the representatives are so chosen that their intersection is 
maximal. That is, if denotes the group of nonsingular linear transfor- 
mations, we assume, for given G, F so chosen that 

F n G is maximal in {G D T(F) | T }. (9.!) 

Theorem. 

Let two maximal unimodular sets, F and G, satisfy (9.1), let 

f e F, g e G, and B be a common basis. Then R(f,B) 

= R(g,B)=> f = g, and hence {f,g} c F fl G. (9.2) 

Stated in other words, the theorem says: if f and g are not common 
elements, then R(f,B) * R(g,B). To see the geometric meaning, note that 
R(x,B) associates to each x a minimal subspace among the subspaces 
spanned by subsets of B. Then (9.2) asserts about the union of F and G: 
if a subspace V spanned by a subset A of B contains x,y of F U G, then 
either x = y, or at least one of the two elements is in a subspace spanned 
by a proper subset of A. 

Proof - We note first that each of F and G contains with x also -x 
{since each is maximal unimodular) and that F fl G contains a basis, as 
easily follows from (9.1). Further, if we assume that under the conditions 
of (9.2) f * g, then all assumptions of (8.0) appear satisfied. The special 
assumption (8.1) can be satisfied in replacing, if necessary, some of the 
vectors in B by their negatives so as to achieve nonnegative coefficients 
in the representation of g. Then (8.11) holds, and we define T by Tx 
= -x when x V- and Tx - x when x e V* V'. Then T(D) = D and Tf 
= g. Hence, G D T(F) D D + g in contradiction to assumption (9.1) This 
completes the proof of (9.2). 
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Dual Programs 



Pierre Huard 



1. INTBODUCTION 

This article is a direct application of the Kuhn-Tucker optimality condi- 
tions [1], well known to those who are interested in nonlinear programming. 
These conditions, necessary (in return for a weak hypothesis) and often 
sufficient (under the hypothesis that the functions are concave), are natu- 
rally at the root of the principal algorithms for solving such nonlinear 
programs, particularly with respect to the optimality tests of these algo- 
rithms. 

Another interesting aspect of nonlinear programs which has been of 
concern to certain authors in recent years is how to generalize the well 
known duality properties of linear programs to the nonlinear cases. Of 
particular interest are the definition of the program dual to a given pro- 
gram and the theorem on the existence of a solution to the dual program 
when the primal program has a finite solution. After the articles of 
Dennis [2] and Dorn [3] concerning quadratic programming (linear con- 
straints and a quadratic objective function), some more general results 
appeared such as the articles of Dorn [4], Wolfe [5], and Hanson [6]. The 
goal of this article, whose principle results had been established in the 
beginning of 1961 [7], is essentially to complete the work by establishing 
the second part of the duality theorem. As will be seen later, this reci- 
procity (the existence of a solution to the dual implying the existence of a 
finite solution to the primal) necessitates a supplementary hypothesis S 
which distinguishes between the linear and nonlinear cases. This hypoth- 
esis S can be expressed by a regularity condition of a matrix. A sufficient 
condition that this hypothesis be satisfied is that the objective function or 
at least one of the constraints satisfied by the optimum of the dual program 
be strictly concave in the neighborhood of this optimum [9J. 

We have tried to generalize, that is, to weaken our supplementary hy- 
pothesis. But we have been able to do no better than to apply the hypoth- 
esis S only to the nonlinear parts of the concave functions; this result, 
nevertheless, permits the theorem to be applied to partly linear functions, 
so that linear programs appear simply as a limiting case. 

Finally, to return to the conditions of Kuhn and Tucker, one ascertains 
that the use of these latter ideas considerably shortens the proofs. 
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Notation 

Matrices are denoted by capital letters, vectors by small letters, and 
scalars by Greek letters. The transposition sign T is used to denote row 
vectors. If A is a matrix and i and j are indices, then AI represents the 
|th row o f A, AJ represents the fih column of A, and A] represents the 
element in row i and column j of A. If a is a (column) Vector and E a 
set of indices, then ag is a subvector of a whose rows are defined by the 
indices of E. If the scalar, $, and the column vector, a, are each functions 
of a vector, x, then (d</dx) is a row vector whose j^ 1 component is 
O<(x)/8xj) and (da/dx) is a matrix whose general element (i, j) is Oai(x)/9xi). 



2. MATRIX EXPRESSIONS OF THE KUHN-TUCKER CONDITIONS 

Let (P) be the following general program: Maximize 0(x) under the 
conditions t 

a(x) ^ (2.1) 

where ^ is a scalar, a is a vector, and both are continuously differentiable 
functions of a vector x. 

The following hypothesis H [1] has the role of eliminating possible 
singular points of the domain defined by (2.1). 

Hypothesis H: Let x be the optimum of (P), E the set of indices of the 
constraints exactly satisfied by x (that is for which one has a(x) = 0), and 
K the cone tangent at x defined by 

(da /dx) (x - x) > 

where the partial derivatives are evaluated at x = x. It is assumed that for 
all points x in K, there is an arc tangent to (x -x) at the point x and en- 
tirely contained in the domain defined by (2.1). 

Under hypothesis H, the following conditions are necessary for an 
optimum at x = x. 

There exists a vector u such that: 

u " (2.2) 

U T (da/dx) + (fy/dx) =0 (2.3) 

u T a(x) = (2 4) 

The vector u is indexed like the vector a and the derivatives are 
evaluated at x - x. 



tThe relations (2.1) may include nonnegativity constraints on certain 
components of x as well as equality relations. For example, b(x) = may 
be written b(x) 2= G and ~b(x) > 0. 
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If < and a are concave functions of xt then the Kuhn-Tucker conditions 
are also sufficient. The solution of the concave program (P) is equivalent 
to the solution of the system formed by the relations (2.1) to (2.4). 

It is easy to give these conditions when the program (P) itself is of the 
form on the left: 



Max <t> (y,z) 


Kuhn-Tucker conditions 


<2.2a) 


b(y,z) = (2. la) 


(v unrestricted) 


c(y,z) ^ (2.1b) 


w ^ 




y 5:0 (2.1c) 


v T Ob/3y) + w T (dc/3y) + O0/9y) ^ 


(2.3a) 


(z unrestricted) 


v T @b/9z) -* w T (dc/dz) * (W/&Z) = 


(2.3b) 




W T C = 


(2.4a) 




iv T Ob/ay) + w T (ac/ay) + (3^/3y)ly = 


(2.4b) 




The partial derivatives as well as y and 
all evaluated at (y f z) = (y,z). 


c are 



3. DEFINITION OF THE DUAL PROGRAMS 

Let us consider the general program (P) considered in the previous 
section as max <f> (x) under the conditions 

a(x) 2: (3.1) 

where the functions <j> and a are concave and twice continuously differ- 
entiable, and such that hypothesis H of Kuhn and Tucker is satisfied. We 
shall call the dual program of (P) the program (D) following: 

minimize 0{x,u) = <>(x) + u T a(x) 
under the conditions 

(d/dx) + U T (da/dx) = (3.2) 

u 2: (3.3) 

the program (P) being called the primal program of (D). 

One can remark that the domain of (P), defined by (3.1), is convex since 
a is concave. 

Further, since a and < are concave, 8 is a concave function of x for 
all fixed u ^ 0. But one can say nothing as such about the constraint (3.2). 

Finally, for all given u 2: 0, but u ^ 0, the point x (if it exists) such 
that (x,u) belongs to the domain of (D), is unique if $ is strictly concave as 



58 MATHEMATICAL PROGRAMMING 

a function of x. For that, it is sufficient that <p, or at least one of the com- 
ponents a^ of a, corresponding to u n > 0, be strictly concave. In fact, 
under these conditions 6 is strictly concave with respect to x, and the 
relations (3.2), which can be written 

(80/axj) = for all j in J 

admit a unique solution, corresponding to the maximum, for the given value 
of u, of the strictly concave function 0. 

4. DUALITY THEOREM 

The optimal solutions of the programs (P) and (D) defined in section 3 
are described by the following theorem. 

Theorem 

1) If x maximizes the program (P), there exists a vector u such that 
(x,u) is an optimal solution of program (D). 

2) Conversely, if (x,u) minimizes the program (D) and if, in addition, 
the following supplementary hypothesis (S) is satisfied 

CS): The matrix 2 0/8x 2 ) is nonsingular at (x,u), 

then the vector x is a solution of the program (P). 

3) In both cases one has 

maximum 0(x) = minimum 0(x,u). 

Proof 

Part 1. If x is an optimal solution of (P) then there exists u such that 

a * (4.1) 

(4 2) 

u T (da/dx) + (d0/dx) = (4 3) 

u T a = (4 4) 

where a and the derivatives are evaluated at x = x. 

Relation (4.1) merely states that x is feasible for (P) while the other 
relations are those of Kuhn and Tucker. 

It is clear that (x,u) is a feasible solution of (D) from relations (4.2) and 
(4.3). 

Further, if (x,u) is any feasible solution whatever of (D) we have: 



u T a)-u T a(x) 



- x) 



by definition 
because is concave 
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u T a(x) = 



- u T a(x) ^ u T (da/dx)0c - x) - u T a(x) 
-u T a(x) ^0 



from (4.4) 

because a(x) is con- 
cave and u 2= 

from (4.1) and (3.3), 
(x,u) being a feasible 
solution of (D) 

by addition 
With (3.2) we obtain finally: 

0(,u) -0(x,u) ^0 (4.5) 

The relation (4.5) being verified for any feasible solution (x,u) of (D), it 
follows that (x,u) is the optimal solution of (D). 
On the other hand, 

follows from (4.4), which concludes the proof of the first part of the 
theorem. 

Part 2, The vectors (x,u) form an optimal solution to (D) satisfying the 
relations 

(d0/dx) + U T (da/dx) = (4.7) 

u ^ (4.8) 

where the derivatives are evaluated at x = x. 

We shall suppose that hypothesis H of Kuhn and Tucker is satisfied for 
this point Cx,u) relative to the domain of D. Under this hypothesis, the con- 
ditions of Kuhn and Tucker are necessary and they can be written by apply- 
ing the particularized formulas of section 2 to the program (D). 

(da/dx)v -a ^ (4.9) 

(3[(d0/dx) + u T (da/dx)]/ax)v = (4.10) 

u T [(da/dx)v-a] = (4.11) 

The derivatives are evaluated at x = x. 

The supplementary hypothesis S, introduced in the second part of the 
statement of the theorem, assures us that the solution v of the homoge- 
neous system (4.10) is unique, and therefore 

v = (4.12) 
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Under these conditions, it is easy to verify that the relations (4.7) to 
(4.12) imply the optimality conditions (2.1) to (2.4) relative to the program 
(P), and for x = x and u = u: 

(4.9) and (4.12) imply (2.1) 

(4.8) implies (2.2) 

(4.7) implies (2.3) 

(4.11) and (4.12) imply (2.4). 

Since a and $ are concave functions, the optimality conditions (2.1) to 
(2.4) are sufficient, and thus x is the optimal solution of the program (P). 
Further, (4.11) and (4,12) imply 



(4.13) 
which concludes the proof of the second part of the theorem. 

5. MODIFICATION OF HYPOTHESIS S AND OF THE THEOREM FOR THE 
PARTIALLY LINEAR CASE. 

The hypothesis S, introduced for the proof of the second part of the 
theorem is clearly too strong to select programs having suitable duality 
properties. This point appears clearly in the cases where <j> and a are 
partially linear, that is when the functions can be put in the form 

a(x,y) = a' (x) + Ay - a (5.2) 

where x and y form a partition of the vector variable. Here < f and a ! 
are concave scalar and vector functions, f and a are constant vectors, and 
A is a constant matrix. 

The matrix of relation (4.10), which enters in hypothesis S, can be 
written in the form 



L OJ (5.3) 

This matrix is clearly singular, and the hypothesis S is not satisfied. 
But in the limiting case where <p and a are completely linear the corre- 
sponding program of course satisfies the linear duality theorem, which, 
although different from the theorem established above, is rather analogous. 
Tbe essential difference is that, in the linear case, the program (D) does 
not contain terms in x. As a consequence, only the first part of the theorem 
is always valid {the dual variable u is the optimal solution of (D), and the 
variable x simply becomes useless); as for the formulation of the second 
part, one sees clearly that it is insufficient to give the solution to (P) 
apart from that to (D), this latter not containing x. 
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It thus seems useful to formulate the duality theorem in a more general 
form which applies to all cases; linear, nonlinear, or partially linear. 
This result is obtained very simply by considering the functions <j> and a 
written in the form of (5.1) and (5.2). The programs (P) and (D), as well 
as the corresponding Kuhn -Tucker conditions, become 



Program (P) 


Necessary and sufficient 
Kuhn -Tucker conditions 


Max T (x) * f T y 
a*(x) + Ay~a 2:0 (5.4) 
(x unrestricted) 
(y unrestricted) 

Program (D) 


u 
u T (da f /dx) + (d<^ f /dx) = at x 
u T A + f T =0 
u T [a f (x) + Ay - a] =0 

Necessary Kuhn- Tucker conditions 


Min & (x) + u T a T (x) - u T a 
u T (daVdx) + (d T /dx) = (5.5) 
u T A + f T = (5.6) 
u 2: (5.7) 
(x unrestricted) 


(v f unrestricted) 
(w unrestricted) 
(daVdx)v' + Aw - a T + a < 
(d[u T (da f /dx) + (d$ T /dx)]/ax)v T = 
u T [da J /dx)v T + Aw - a f + a] = 



(5.8) 

(5.9) 

(5.10) 



[The expression for the objee- [In the last three relations, a T and 0' and 
tive function of program (D), their derivatives are evaluated at x = x.J 
which does not contain y, has 
been simplified in writing 
(5.6).] 

If one calls (x,u) the optimal solution of (D), and (v\w) the corresponding 
dual variables, one can verify easily that if the matrix of relation (5.9) 

O[u T (da T /dx) + (d0 f /dx)]/ax) (5.11) 

evaluated at (x,u) is nonsingular, then an optimal solution of (P) is given by 
x = x, y = -w 

with the dual variable u = u. Thus are found the properties of the duality 
theorem stated above for the nonlinear part, namely the functions & and 
a T , and the classical duality properties of linear programs for the linear 
parts involving the variable y. In particular, the dual variable of (D), 
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relative to the constraint (5.6) which is independent of x, represents the 
"linear" part of the optimal solution of (P). 

The program (P) having its objective function and its constraints written 
in the form (5.1) and (5.2), the duality theorem may be modified as 

Theorem 

1) If (x,y) maximize the program (P), there exists a vector u such that 
(x,u) is the optimal solution of the program (D). 

2) Conversely, if (x,u) minimize the program (D), and under the limita- 
tion of the following supplementary hypothesis 

(S) The square matrix 0[u T (da T /dx) + (d< T /dx)]/ax) is nonsingular 
at x = x, 

there exists a vector w such that (x, -w) is an optimal solution of the 
program (P). 

3) In both cases one has 

maximum $(x,y) = minimum 0(x,u) 

It appears that the first and second parts have a more symmetric form 
as far as the extra variables are concerned. 
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SYMMETRIC DUAL QUADRATIC PROGRAMS 

Richard W. Cattle 

ABSTRACT 

The purpose of this paper is to exhibit a pair of quadratic programs 
which are symmetric, dual, and related to one which is self -dual. Sym- 
metry, duality, and self -duality in quadratic programming have each been 
treated by W. S. Dorn in separate publications. It is believed that the pro- 
grams offered here encompass all these features and hence tend to unify 
the theory. 

Specifically, the programs are: 

(1) Primal problem 

Minimize F(x,y) = |y T C*y + Jx'Cx + p'x 
subject to C*y + Ax ^ -b 

and x > 

(2) Dual problem 

Maximize G(u,v) = -Jv T C*v - u T Cu - b f v 
subject to A T v * Cu ^ p 

and v ^ G 

where C and C* are symmetric, positive semi-definite matrices. 

The symmetric dual programs of linear programming can be obtained 
from (1) and (2) by setting C and C* equal to zero matrices. Setting just 
C* equal to zero yields the dual problems of Dorn. 

The duality of (1) and (2) is proved by means of the duality theorem of 
linear programming. It is a consequence of the demonstration that 

Theorem; If either problem (1) or (2) has an optimal solution, there 
exists a common optimal solution for (1) and (2) . 

Theorem: If both (1) and (2) are feasible, then both (1) and (2) have 
optimal solutions. 

It is shown that there exists a (primal) quadratic program which is 
formally self -dual. Here the results are analogous to those in linear 
programming and, in a sense, generalize them. 
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ORTHOGONALITY, DUALITY, AND QUADRATIC TYPE PROBLEMS 

IN MATHEMATICAL PROGRAMMING 



C. E. Lemke 

ABSTRACT 

In this paper a number of problems of near-linear and quadratic-like 
character are considered, all of which may be posed as linear programs 
subject to additional orthogonality side constraints. The aim is to effect a 
single general formulation embracing a wide class of programming prob- 
lems. An extension of the duality notions of linear programming, and based 
thereon, is proposed for these problems. 
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Methods of Nonlinear Programming^ 



Philip Wolfe 

1. INTRODUCTION 

The problem of concern is that of maximizing the objective function 
f(x) in the n variables x - (xj, . . . , x n ) subject to the constraints 

gi(x) :s for i = 1, . . . , xn (1) 

Except for these constraints, it is supposed that each x; may assume any 
real value. The term "nonlinear programming" is not appropriately used 
to refer to any programming problem which is not linear; the functions 
above must have some kind of smoothness. Problems having integer -valued 
variables, for example, while being problems of "not-linear programming/' 
are not conventionally taken to be problems of nonlinear programming. The 
functions f and g^ will be assumed differentiate throughout, although 
nothing is lost from most of our work if only piecewise differentiability is 
assumed. 

Most of the methods studied here aim at finding a local solution of the 
problem a solution valid in the immediate neighborhood. If, however, f is 
assumed concave, and gj convex, then any local solution is global is the 
actual solution of the problem. Some of the procedures studied, in fact, 
require these properties in order that they arrive at even a local solution. 
There are two equivalent definitions for the convexity of a differentiate 
function g: 

For any two points x,y and scalars r,s ^ 0, r + s = 1, 

g(rx + sy) ^ rg(x> + sg(y); (2) 

For any two points x,y, 



fThis article is an abridgment of Recent Developments in Nonlinear 
Programming, The RAND Corporation, R-401-PR, May 1962. The research 
was sponsored by the United States Air Force under Project RAND. Views 
or conclusions contained in this article should not be interpreted as repre- 
senting the official opinion or policy of the United States Air Force or The 
RAND Corporation. Permission to quote from or reproduce portions of 
this article must be obtained from The RAND Corporation. 
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g(y) - g(x) Vg(x) (y - x) , ( 3 ) 

where Vg(x) = Og/8x,, . . . , 8g/8x n ) is the gradient of g, the vector pointing 
in the direction of its maximum rate of change. 

Figure 1 will be used to illustrate the general nonlinear programming 
problem. The constraint set is bounded by five nonlinear constraints, 
whose boundaries g t (x) = intersect in the edges and vertices of the figure. 
Convexity of gj entails convexity of the set, as shown. A suitable concave 
objective function is f(x) = -Zj (xj - Pj) 2 ; the problem is then that of finding 
the point of the constraint set closest to P. 

Table 1 summarizes some of the features of the procedures studied in 
the sequel: whether the procedure is designed for a quadratic (Q) or a 
general nonlinear (N) objective function; whether for linear (L) or non- 
linear (N) constraints; whether further assumptions are needed to insure 
convergence to a local solution; and whether the procedure terminates in 
aa exact solution if the constraints are linear and the objective either 
linear or quadratic. 

Note that a procedure which can handle nonlinear constraints can always 
handle a nonlinear objective, since a constraint of the form z - f (x) < may 
be added to the problem. The matter of termination in the linear and quad- 
ratic applications of the procedure is possibly not of great interest, but 
may bear on the speed of convergence in other cases. The term "convexity" 
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Table 1 
FEATURES OF THE METHODS 



Direct differential 
gradient 

Lagrangian differ- 
ential gradient 

Simplex method for 
quadratic progr. 

Gradient Projection 
I 

Gradient Projection 
II 

Reduced-gradient 
Separable progr. 
Decomposition 
Cutting -plane 

Accelerated 
cutting -plane 



Objective 

N 

N 

Q 

N 
N 

N 
N 
N 
N 
N 



Conditions 

for 
Constraints Convergence 

N not known 



Termination 

if Objective 

is ... 

Lin, Quad, 
no no 



N 



N 

L 
N 
N 
N 
L 



strict 
convexity 

convexity 



convexity 



no 



no 



yes yes 
yes no 
yes no 



convexity 
convexity 
convexity 



yes 

yes 
yes 
yes 
yes 



no 
no 
no 
no 
yes 



is used in the table to refer to convexity of the constraints and concavity of 
the objective. 

All these methods are discussed in the sections which follow. Computer 
routines for them and computational experience are cited wherever possible, 
but there is little to report. Lacking data which would permit the compari- 
son of any of these methods, we nevertheless have opinions on the relative 
efficiencies of some of them. We trust that these opinions, which are de- 
livered at the end of each section, will be readily distinguished from the 
more objective portion of this paper. 

The present paper is a modest condensation of an unpublished report [1J- 
Discussion of a procedure [2] for quadratic programming which has been 
superseded by others [3, 4] has been omitted, as has a discussion of proce- 
dures for finding a feasible solution to a nonlinear problem, that is, a point 
x satisfying the constraints of Eq. (1). The latter discussion can be sum- 
marized in these two remarks: For linear constraints, the simplex method 
for linear programming provides a means for finding a feasible solution 
which is hard to improve upon; For nonlinear constraints, a procedure 
similar to that generally used with the simplex method can be employed 
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apply the nonlinear procedure under consideration to the minimization of 
the function 2i{gi<x) : gi(x) > 0}, which will vanish when x is feasible. 

2. BSRECT DIFFERENTIAL GRADIENT METHODS 

The " differential gradient" procedures are best described by differen- 
tial equations expressing the idea that a trial point x is to be moved in the 
direction of greatest increase of the objective f with appropriate altera- 
tions to enforce the constraints. Their basic form can be given as 



where 



if gi(x) ^ 0, 

0. (5) 



In q. (5), the number K is chosen sufficiently large to keep x from leaving 
the constraint set for example, larger than the maximum of all 
| Vf(x) ! / 1 Vg(x) I for any x on the boundary. The terms of Sifii^Vg^x) 
serve to "kick back" x when it tends to leave the constraint set. 

It can be shown, under fairly general conditions, that these equations 
have a. solution. Of course, one would not attempt to find it analytically, 
but rather by a digital procedure in which, with a suitable selected interval 
At chosen, the equations would be used to calculate a displacement Ax of 
the test point to x + Ax, and the process repeated. 

The question of the convergence of the solution x(t) of the differential 
equations has not been studied in detail, as far as we know. Assuming, 
however, thatx = Limt oox(t) exists, and that the functions 61 have an 
average value 



Ui- Urn ~ J^ T 6i(x(t))dt 



T- 
we can show t&at x solves the problem, for: 






To 

vf{x)-; 

i 
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The point x must satisfy the constraints, for if gj{x) > then U{ = K, which 
is impossible, owing to the choice of K. Further, it is evident from the 
definition of 61 that if gi(x) < for any i then u = 0; and in general u| ^ 0. 
We have thus proved the existence of the generalized Lagrange multipliers 
Ui introduced into nonlinear programming by Kuhn and Tucker [5J: 

If x solves the nonlinear programming problem, then there exist non- 
negative Ui such that 

Vf (x) = } u i^gi(x) and gj < implies Ui = (6) 



It is easy to show that if f is concave and gj convex then the existence of 
multipliers satisfying these conditions is sufficient for x to be a solution 
if it satisfies the constraints. Many other derivations of this result are 
available, and the multipliers themselves have considerable significance 
when suitably interpreted in applications.! They will appear here and 
there in the sequel for computational purposes. 

There has been a certain amount of activity in differential gradient 
procedures for some twelve years; Brown [7] surveys in detail most of the 
work done before 1957, and the basic idea is periodically proposed afresh 
[8]. In view of the fact that no successful numerical experience on signifi- 
cant problems has yet been reported, we feel that this class of method is 
not promising. An important measure of the inefficiency of a digital proce- 
dure is certainly the number of evaluations of nonlinear functions it re- 
quires; and it appears that these methods require a great number. No doubt 
the fact that they cannot give exact answers to problems for which other 
methods can has also worked against their acceptance. 

These objections are not important if analog, rather than digital, compu- 
tation is considered; the analog computer is a natural setting for the basic 
differential equations, and several experiments in this direction have been 
made [9-11]. Solutions can be obtained almost instantaneously and param- 
eters of the problem are readily changed. The possibilities of analog equip- 
ment, especially for problems of linear programming, do not seem to have 
been exploited to the degree they might; some reasons, probably, are the 
rather long setup time and the fact that, if general -purpose equipment is 
used, a large machine is needed for problems of reasonable size. The view 
that the accuracy of analog equipment is insufficient for mathematical 
programming is usually abandoned when the accuracy of the input data is 
carefully considered. 



3. THE LAGRANGIAN DIFFERENTIAL GRADIENT METHOD 

The Lagrangian differential gradient method uses differential equations, 
like Eqs. (4, 5) above, but explicitly introduces the Lagrange multipliers 
whose existence was inferred in Sec. 2. The process is governed by the 
differential equations 



tSee Gale [6], for example, for many illustrations of their usefulness. 
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mud 

du ^ | gj(x) if ui > or gi(x) > 0, (8) 

dt 1 otherwise. 

It eta be shown that if f is strictly concave and gj are strictly convex, then 
the differential equations have solutions x(t), u(t) which converge to values 
x, u as t-* *. It is then easily argued that x and ii satisfy the Kuhn- 
Tucker conditions, Eq. (6), and thus solve the problem. Strict convexity is 
essential in this procedure; it will not converge, for example, if f and gj 
are ail linear. 

The Lagrangian differential procedure has the odd property that the 
point x may depart from the constraint set during the computation; the 
terms U which eventually force it back in do not immediately become 
large. Only this method and the cutting-plane method below, of those 
studied here, thus operate at times outside the constraint set. 

Moat of the theoretical work on this method has been reported by Arrow, 
Hurwicz, and Uzawa [12 J. Thomas Marschak- [12]f reports some ex peri - 
meats programmed for the RAND JOHNNIAC computer on an ordinary 
linear programming problem. Using a high -precis ion scheme for obtaining 
the trajectory of the Eqs. (7, 8), that routine took an uncommonly long time 
to solve small problems. It does not appear that this procedure has been 
used as effectively as it might, since the exact trajectory x(t) is not of 
much interest, but we feel that it is not likely to behave a great deal better 
than the direct differential gradient method. 



4, SIMPLEX METHODS FOR QUADRATIC PROGRAMMING 

*' Quadratic programming' 7 now conventionally denotes the problem of 
maximizing a quadratic function under linear constraints. In the notation 
of See. 1: 

*<*> - E P}*j ~ E QjkXjXk (j, k = 1, . . . , n) (9) 

J 3,k 

bi ft = l, ...,m) (10) 

If the function f is to be concave, the n by n matrix Q must be positive 
Bftmkiefinite. This problem has a unique property among nonlinear pro- 
gramming problems: an exact solution may be obtained, as in linear pro- 
gramming lay iiaear methods, essentially because the gradient of f is 
Liaear. There are two prominent " simplex methods" for quadratic pro- 
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gramming: that of Beale [3] and that of Wolfe [13], While the first uses the 
ideas of the simplex method for linear programming in a more fundamental 
way, and thus has a better claim to the title of this section, and further 
yields a local solution even when the objective is not concave, unlike the 
second procedure; we shall describe the latter here, on the grounds of 
greater familiarity. 

Assuming f concave, the conditions of Kuhn and Tucker, Eq. (6), are 
both necessary and sufficient. Letting y i = -gj(x), for the quadratic prob- 
lem they become 

S a ik*k + 71 = b i ( a11 
k 

2 Qjk x k + u i a ij + Pj z j = Pj (all J) (11) 

k i 

yi ^ 0, m a: 0, Zj = 

Ignoring for the moment the requirement zj = 0, they have the initial 
solution 

x k = 0, ui = 0, yi = bi, Zj = 1 (all i, j, k) (12) 

Our task is to 

Minimize Zj 

under the constraints of Eq. (11) and the restriction: 

If ui 5* 0, retain yj = 0; and vice versa (13) 

The minimization is performed exactly as in the simplex method for linear 
programming, except that the restriction of Eq. (13) serves to restrict the 
choice of incoming variable. The procedure will terminate with Zj = 0, so 
that the Xj thus found will solve the problem. 

The Beale procedure has been programmed for the Ferranti Mercury 
computer under the name "Quandary A." The version of November 1960, 
could accommodate 65 inequality constraints in 63 variables. The Wolfe 
procedure is used by the SHARE routine RSQP1 for the IBM 704 and 7090, 
accommodating problems for which the sum of the numbers of variables 
and constraints is less than 253. While both these procedures have steps 
not used in linear programming, the time taken for a quadratic problem 
does not seem to differ greatly from that taken for a linear problem of 
comparable size. 

5. PROJECTED-GRADIENT METHODS 

"Projected-gradient" methods can be viewed as resulting from the at- 
tempt to make a differential gradient method take steps as large as passible 
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while never allowing the point x to leave the constraint set. Several proce- 
dures of this sort have been proposed, both for linear and for nonlinear 
constraints (labelled respectively ''gradient projection I" and "gradient 
projection II" in Table 1). We shall deal at greatest length with the proce- 
dure for linear constraints due to Rosen [14]. It is illustrated in Fig. 2, 
beginning at the feasible point x and leading to the sequence of points 

x 1 , x 1 In the discussion below, the word "plane" denotes the entirety 

of a single hyperplane of the form gj{x) = 0, whose intersection with the 
constraint set yields, in general, one of its faces. 

Starting with the point x^ in the constraint set, either one or two suc- 
cessors of x k are determined by the following steps. A particular set of 
planes is associated with x^ at all times; initially, let this be the set of all 
planes which pass through x^. 

(1) Calculate Vf(x k ). 

(2) Find the projection of Vf{x k ) onto the intersection of all the planes 
associated with x k . (In case there are no planes as when x k is 
interior to the constraint setthis intersection is the whole space, 
and thus the projection is Vffck) itself.) 

(3) If the projection is different from zero, extend a ray from x k in that 
direction, and define xk+1 to be the farthest point along the ray be- 
longing to the constraint set. 

(a) If ffck* 1 ) > ffck), then the cycle is complete. 




Fig. 2. Projected-gradient method 
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(b) Otherwise, choose x k+2 so as to maximize the function f on the 

segment x k x k+ l; this completes the cycle. 
(4) If the projection is equal to zero, then Vf(x k ) may be written 

Vf (x k ) = u 



as a linear combination of normals Aj to the planes associated with 
x k (the AI are chosen to point away from the constraint set). 

(a) If all U > 0, then x k is the solution of the problem, for the Kuhn- 
Tucker conditions, Eq. (6), are satisfied. 

(b) Otherwise, define a new set of planes to be associated with x k by 
deleting from the present set some one plane for which uj < 0, 
and return to step (2). 

It is assumed that the one -dimensional maximization problem which 
may have to be solved in step (3b) is not difficult to cope with, which is 
usually the case. In Fig. 2, the points x 4 and x 6 have been obtained as the 
result of minimizing on the segments x2x3 and x*x6; at these minima Vf is, 
of course, perpendicular to the segment in question. Convergence of the 
procedure in the case of a linear objective is not difficult to show. 

A procedure very closely related to the above, which need not actually 
use Vf (x k ) as a projection, can be given. The projection of Vf(xk) onto 
the face in which x k lies turns out to be precisely the direction of steepest 
ascent for the function f per unit distance in the Euclidean metric, if that 
direction is chosen so as to keep one in the constraint set. If, on the other 
hand, some other metric were used, a different algorithm would be obtained. 
The metric 

I |Ax| | =max{|A Xl |, ..., iAx n |}, 

for example, changes the work of step (2) from that of finding the projec- 
tion of Vf (xk) onto the face of x k to that of determining a point y that maxi- 
mizes Vf (x k ) - y under the linear constraints of the original problem aug- 
mented by the constraints | x!f - yj | =s 1; the direction of motion away from 

x k is then that of the ray from x k through y. Several variants of this proce- 
dure have been proposed by Zoutendijk [15], and a related one by Lemke 
[16], Some of the work of Frisch [17] is close to this approach. An excellent 
detailed survey of projected -gradient procedures by Witzgall [18] exhibits 
the Rosen, Frisch, and Lemke methods as variants of a single basic scheme. 

Rosen [14] has reported programming his method for the IBM 704 and 
7090. Computational experience with these has not been reported, except 
for the observation that the procedure does not seem as well -suited to the 
linear programming problem as does the simplex method. The experiments 
of Witzgall [19] support this observation, as well as indicating the possi- 
bility of numerical difficulties with this and with a Frisch procedure not 
shared by the simplex method. No other experience with these methods 
has been reported, although rumor has it that other Frisch and Zoutendijk 
procedures have been tried. 
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Extensions of the gradient projection procedure to problems having non- 
linear constraints have been proposed by Rosen [20], and Fiacco, Smith, 
and Biacfcwell [21], Both procedures operate roughly in this way: At any 
trial point x the function f and the constraints effective in the immediate 
neighborhood of x [i.e., those for which g^x) is close to zero] are replaced 
by their first -order Taylor's series approximations. In the thus linearized 
local problem, a step of carefully selected length is taken in accordance 
with the rules above for the method with linear constraints. If the new 
trial point is still feasible, this process can be repeated; if not, a step in 
the direction of -Vgi(x) for those g^x) > (i.e., back through the boundaries 
of those constraints which were violated) should yield a feasible point. 
Roeen reports having programmed this procedure, but computational ex- 
perience has not been cited. 

The projected-gradient methods seem to constitute a sound attack on the 
nonlinear programming problem, especially as they are not dependent on 
any properties of the functions involved other than reasonable smoothness. 
The likelihood of their efficiency is much more convincing in the case of 
linear constraints than for nonlinear constraints, where it is not clear that 
they will perform much better than a differential gradient method designed 
to take fairly large steps. Some hope has been held that they might be use- 
ful even for linear programming problems, since in a single step a 
projected-gradient method can pass from a trial point on one side of the 
constraint set clear to the other side, not being required to pass from 
vertex to adjacent vertex as with the simplex method. No evidence has 
appeared to support this hope, for which two reasons may be given. First, 
it appears that the graph -theoretic diameter that is, the maximum number 
of steps needed to trace a path from one vertex to any other, which is a 
measure of the work required by the simplex method of a typical con- 
straint set is smaller than intuition leads us to expect. Second, even if a 
projected-gradient procedure were lucky enough to find the solution of the 
problem in one step, it would still require as many as m iterations to con- 
vince itself that no better point could be found. The fact that a procedure 
must not only solve a problem but also demonstrate that it has done so is 
often overlooked. 



. THE REDUCED-GRADIENT METHOD 



rducd~gradient method is like the projected-gradient methods in 
tfee gradient erf tfc objective function to give the desired direction of 
It is designed only for linear constraints, its computational basis 
bsteg the simplex method for linear programming. The simplex method 
dots sot, 0* course, provide for solutions other than vertices of the con- 
straint set; tills procedure may be viewed [13] as an extension of the 
simplex method which makes such provision. Since the gradient is not 
projected but is "reduced 1 ' in order to impose the constraints of the prob- 
ta*>, AS is the objective of a linear function for the simplex method, the 
procedure is not conveniently illustrated geometrically. 
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At the beginning of a typical step of the procedure there is a current 
feasible point x k and a simplex method basis, which is essentially a parti- 
tion (y, z) = x of the variables of the linear constraints Ax = b into a set of 
dependent (or basic) -variables z and independent (or nonbasic) variables y, 
connected by the relations z = E - Ay which are equivalent to the original 
constraints. (Unlike the simplex method, b ^ is not required.) We suppose 
that the basic portion of x k is positive. 

(1) Calculate Vf(x k ), and obtain (as with the simplex method, using 
Vf(x k ) as the coefficients of the objective) the "reduced costs" 

Cj = [Vf(xk)]j - SiSylVftx^lj for the nonbasic variables yj (where i 
ranges over all the basic variables). 

(2) Define Ay by Ayj = Max{0, Cj} for all nonbasic j, Az = -SAy, and 
finally Ax = (Ay, Az); Ax is the " reduced gradient. " If Ax = 0, the 
problem is solved. 

(3) Determine the step length O as the smaller of the two values of & 
achieving 

Max0 {x + 6 Ax 2: 0}, Max$ f(x + B Ax), 

and replace x by x + 0Ax. 

(4) If all basic variables of the new x are positive, return to step (1). 
Otherwise perform a simplex method pivot step, interchanging any 
vanishing basic variable with some non-vanishing nonbasic variable. 
Return to step (1). 

This procedure is very close to the simplex method. Indeed, it becomes 
the simplex method if in step (2) the definition of Ay is altered so that Ayj 
= 1 and Ayj = for j*J, where J is such that Cj = Maxj Cj; then only one 
nonbasic variable is increased at a time, and one basic variable vanishes, 
so that only basic solutions appear. The method has been shown to con- 
verge to a solution for a nonlinear objective function and to terminate for 
a linear objective function if the objective is bounded and the constraints of 
the problem are nondegenerate. 

This recent procedure has not been tested computationally. It is ex- 
pected to behave like the projected -gradient methods. 



7. THE SEPARABLE PROGRAMMING METHOD 

The separable programming method [22], like the decomposition method 
of the following section, makes use of a linear programming problem con- 
structed to be a good approximation of the nonlinear problem . The data for 
the linear problem result from the evaluation of the functions of the non- 
linear problem on a grid of points spanning a suitable portion of the space 
of the problem. 

Let x 1 , x 2 , . . . , X T be a collection of n-vectors. Any point x of the 
convex hull of this collection (the smallest convex set containing it) may 
be written 
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, where (14) 



X* = 1 and Xt > (all t). (15) 

t 

Given any function h of x, the linearization of h on the grid x 1 , . . . , x^ 
is attained through the approximation 



(16) 
t 

Any mathematical programming problem becomes a linear problem in 
the variables X* if x and h(x) are replaced throughout by their representa- 
tions above. Using this representation, the mathematical programming 
problem may be stated in the approximate form: 



Maximize J X^x*) subject to (17) 

t 

t = 1, and 

SO (alii). (18) 

Let X 1 , . . . , X T be the solution of the problem stated in Eqs. (17) and (18). 
Then 

~~ AA (19) 



is offered as an approximate solution of the original problem. If the func- 
tions gi are convex, we have 



: gijA l * t ]*X t g i (xt)s:0, (20) 

so that f satisfies its constraints. How closely f (x) approximates the max- 
imum obtained in the linear problem is determined by the fineness of the 
grid IB general. But if f is concave, then 



* 5 1 " Et l6aSt aS hih a value of ^ ob J' e ctive function as is indi- 
by tbe solution of the linear problem. 

The observations above make grid linearization an effective tool for 
ems toying tfee proper convexity; but where convexity does not ob- 
a more rafcoed technique must be used. This technique has so far 
implemented only for problems in which each nonlinear function is 
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separable, that is, may be written as the sum of separate functions of the 
components xj of the point x: 

f(x) - J f(xj), 



gi(x) = gij(xj) (all i). (22) 

The linearization technique is applied separately to each variable Xj. Sup- 
pose that for each j a sequence of values xj, . . . , xj has been chosen (we 

suppose the same number T chosen for each j). Write 

x j = E Vx (23) 






The resulting linear programming problem, derived from Eqs. (17) and 
(18), is 



Maximize JyZ) *1 f fcj) sub J ect to (24) 

j t 

^ (all j, t), 2 *j = 1 < al1 J). and 
t 



(25) 



From the solutions X of this problem the approximate solutions 



< 26 > 



of the original problem are obtained. 

These solutions will not be good approximations unless the following 
condition is satisfied: For each j, X^ must vanish except for, at most, two 
values of t, which must be adjacent. The condition ensures that the inter- 
polation our formulas accomplish is always done between two adjacent 
grid points. (Curiously, for convex problems this automatically occurs.) 
In the separable programming algorithm, the condition is enforced by re- 
stricting the choice of incoming variable permitted to the simplex algorithm 
applied to the linearized problem: The nonbasic variable Xj is a candidate 



for the basis only if either A*+l O r X?' 1 is already a member of the basis. 

If the grid is chosen suitably fine in the neighborhood of the solution, 
answers of good accuracy can be obtained, and the procedure can be aug- 
mented so that it automatically constructs its own refinements, in a manner 
like that of the decomposition procedure below. A version of the procedure 
for functions not necessarily separable into functions of single variables 
has also been given [22], 



80 MATHEMATICAL PROGRAMMING 

Separable programming is a feature of the mathematical programming 
routine SCM3, scheduled for release by the SHARE distribution agency in 
1962. The Standard Oil Company of California has used the method for 
several years on a variety of nonlinear problems, although details of their 
computational experience have not been given. 



8. THE DECOMPOSITION PROCEDURE 

In the case of nonseparable nonlinear problems, any grid of reasonable 
fineness covering a large region will include a tremendous number of 
points, posing considerable data processing problems. In actual fact, how- 
ever, only a small number of these points would ever actually be used in 
the computation. On account of the basic properties of the simplex method, 
the final solution of the approximating linear programming problem would' 
involve only m + I points; and probably only some small multiple of this 
number would be used in the course of arriving at the final solution. These 
facts indicate that it would be well to investigate how grid points might be 
generated when needed, rather than all set down a priori. The decomposi- 
tion algorithm for linear programming is a particular device for using the 
data of very large linear programming problems of a certain form to gen- 
erate recursively only the needed data for a smaller linear programming 
problem whose iterated solutions solve the larger problem. What follows 
is essentially the application of this method to our nonlinear programming 
problem, conceived as being represented by a linear program of the form 
of Eqs. (17, 18) constructed from an arbitrarily fine grid. 

l*t a grid x 1 , . . . , X T be given, and let the associated linear program - 
miagproWem, Eqs, (17, 18), be solved, yielding as well as the solution 
X , . . . , X , the dual solution u , . . . , u m (for convenience, the equation 
It* * 1 Is numbered 0). Allowing complete freedom in choice of grid 
points, we may pose the question: Of all possible points xT+1 that might 
be adjoined to fee given grid as a further refinement, which point would 
fee simplex method first choose as contributing the most to the solution of 
the feus extended linear programming problem? 

T^colnma added to the problem, Eqs. (17, 18), will have the objective 
coefficient ffer+I) and the remaining coefficients 

**** ** mfc T<fl . Recalling that in the "revised form" of the 
method, the reduced cost for a column is formed by subtracting 
! C0efficleilt the illoer P>dact of the dual solution of the 
wife fee coefficients of the column, and recalling that the 

l r* is desired - *" the column to 

of the problem. 



Maximize f<x) - 5, - u igi(x) {x unconstrained) 



(27) 
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method once more employed to find a new solution to the expanded linear 
programming problem. 

The repeated application of the procedure just described constitutes the 
decomposition algorithm for nonlinear programming. It can be shown that, 
when the functions involved are appropriately convex, the process con- 
verges to a solution of the original nonlinear programming problem, in the 
sense that any limit point of the sequence 

S*y. T-co (28) 

t=i 

is a solution of the problem. Unfortunately, it seems that the method will 
not give any useful results if the convexity assumptions do not hold. 

As far as making efficient use of the grid points needed is concerned, 
the decomposition algorithm seems perhaps as good as possible. Actually, 
the burden of the work has been shifted to the subproblem, Eq. (27), which 
must be solved afresh for each iteration using u"i from the previous itera- 
tion; the over -all efficiency of the procedure depends on how readily it can 
be solved. In the general case, it is not necessarily substantially easier to 
solve than the original problem, but in many special cases it is. The fact 
that it is expressed without inequalities often makes classical extremization 
techniques practical. 

For example, suppose that the original problem is separable: that is, 
that f and gj are of the form given by Eqs. (22). Then the problem, Eq. (27) f 
becomes 



Maximize ]T 

j 



ff j ( Xj ) - u - D u igij ( Xj )l (29) 



Since there are no constraints on x in this problem, its solution is obtained 
when each of the terms of the summation is independently maximized. The 
new x T+1 is tims made up of the components xT + 1 = xj obtained from the 
solutions of the n problems 

Maximize fj( Xj ) - ujgy (xj) (30) 

In most practical cases, these are readily solved by elementary calculus, 

The decomposition procedure for nonlinear problems has so far been 
tried only in an experimental routine written by Shapiro [23] for the IBM 
704, designed to solve the "chemical equilibrium" problem of minimizing 
the expression 2jXj(cj + In xj/Zj^x^) under linear constraints. While de- 
tailed records of computational experience were not kept, the procedure 
worked satisfactorily, although no better than a special method devised 
earlier for this problem [24]. 

9. THE CUTTING-PLANE METHOD 

The cutting -plane method of Kelley [25] is a dual of the decomposition 
method (in a manner which can be made precise, but which is not done 
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here); while the decomposition method was based on the idea that the con- 
straint set could be represented as the set of all convex combinations of a 
sufficiently dense set of points in it, the cutting -plane method is based on 
the idea that it can be represented as the intersection of a sufficiently 
aumeroua set of half-spaces which contain it. In both procedures we try to 
do as good a job as possible of deciding what data are needed before calcu- 
lating them. 

It is most convenient to describe the method for a problem having a 
linear objective function; as mentioned in the Introduction, no generality 
is lost. We must assume that the nonlinear constraint functions are convex 
the procedure will not give reasonable answers otherwise. 

The main tool of the procedure is the representation of the constraints 
by first-order Taylor's series approximations. Let x t be some n-vector. 
Expanding the function gi about X*, the nonlinear constraint gi(x) < will 
be replaced by the linear constraint 

<x-x t ) ^0 (31) 

Note that the left-hand side of Eq. (31) is never greater than g^x), since gi 
is coocave; so that if x happens to belong to the constraint set that is, if 
g|(x) ^ for all i then x will satisfy every inequality of the form of 
Eq. {3D, for any x t . 

Let now a sequence of points x 1 , . . . , X T be given. The linear program- 
ming problem to be solved as an approximation to the original problem is: 

Maximize f(x) subject to 

gi(x*) + VgjMCx -x 1 ) ^ (i = 1, . . . , m; t = 1, . . . , T) (32) 



If the solution x = x of this problem should happen to satisfy all the original 
constraints, then it would be the solution of the original problem, because 
it would maximize the objective over a constraint set that defined by Eq. 
(32) which is at least as large as the original. 

The recursive step of the cutting-plane procedure is this: If x does not 
satisfy all the constraints of the original problem, define X* 14 " 1 = x, use 
X 141 to construct new linear inequalities of the form of Eq. (32), and solve 
the new linear programming problem. 

tte convergence of the procedure is not difficult to prove. The only 
starting ccpditioii which must be assumed is that an initial set of points 
K 1 , . . . , x cms be chosen so that the objective of the linear programming 
problem is bounded above, (If any of the family of linear problems thus 
generated should have no point satisfying its constraints, it would follow 
that the original problem had the same property, so that satisfaction of the 
constraints is guaranteed if the original problem is known to have solutions.) 

It is noteworthy that, unless the process terminates, the added point 
x TO always lies outside the constraint set. Neither does that point satisfy 
all the constraints constructed from it for the next iteration, since letting 
x * x"*l in the relation of Eq. <32) gives gi(x T+1 ) < 0, which cannot hold 
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for all i. The point x T+1 thus lies on the opposite side from the constraint 
set of the hyper plane 



= (33) 

for such i that g i (x T " l ~ 1 ) > 0. These hyperplanes constitute "cuts," cutting 
off pieces of the polyhedral constraint set defined by Eq. (32) and producing 
an improved approximation to the original constraint set in the neighbor- 
hood of the point x T+1 . 

Considerable advantage can be taken of the fact that the linear program, 
Eq. (32), does not change appreciably from one iteration to the next. The 
dual simplex method makes it possible to add constraints to a linear prob- 
lem which has already been solved and efficiently find a solution to the new 
problem. In this respect the cutting-plane method is a sort of "dual" of the 
columnar methods, in which rows rather than columns are added to a linear 
programming problem at each iteration. 

In practice one would not add all constraints of the form of Eq. (32) at 
each iteration. The best scheme would probably be to add a single linear 




Fig. 3. The cutting-plane method 
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conatraint corresponding to the "most unsatisfied" nonlinear constraint, 
formed for that i which achieves Maxi gi(x T+1 ). This step is then exactly 
analogous to the decomposition procedure step of augmenting its linear 
programming problem by the single most profitable column. Figure 3 
shows two steps in this process, 

The following acceleration device [26] has been proposed for the cutting- 
plane method as applied to a problem having linear constraints: At, say, 
every other step of the procedure, define a point x from which a new in- 
equality is to be constructed by 

T , , 

2) uV 
t=i 

where u* is the sum of the values of the dual variables associated with the 
constraints generated from x*. Whether this device will indeed accelerate 
the procedure in general is not known, but if the objective is quadratic, then 
the procedure will terminate; some f will solve the problem. 

Computer routines using the cutting-plane method have been reported by 
Dorabeim [27J and in a simplified version by Griffith and Stewart [28], but 
ttey are not available, and experience with them has not been detailed. It 
appears that the method has given these users satisfaction. While we have 
no experimental evidence to support our view of this method, its general 
tidiness leads us to feel that it should constitute an excellent procedure for 
problems having convex nonlinear constraints. 
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ON THE GRADIENT PROJECTION METHODS 
OF R, FRISCH AND J. B. ROSEN 



C. Wttzgall 

ABSTRACT 

R. Frisch (1958, Multiplex Method) and J. B. Rosen (1959) gave two 
closely related methods for solving linear and convex programming prob- 
lems. Both methods use the same tableau technique, but different rules for 
selecting the pivot. 

A few experiments were conducted in the linear case in order to com- 
pare these methods with the simplex method. These experiments indicate 
a superior numerical stability of the simplex method. Compared with 
Rosen's pivot selection, the one of Frisch payed off by reducing the number 
of operations and increasing numerical stability. 

More or less all algorithms for solving the linear programming problem 
are known to be modifications of an algorithm for matrix inversion. Thus 
the simplex method corresponds to the Gauss -Jordan method. The methods 
of Frisch and Rosen are based on an interesting method for inverting sym- 
metric matrices. However, this method is not a happy one, considered 
from the numerical point of view, and this seems to account for the relative 
instability of the projection methods. 

The iteration steps of Rosen and Frisch may be interpreted as simplex 
steps using a tableau which is based on the product A T A where A denotes 
the constraint matrix. This interpretation leads into the neighborhood of 
techniques due to P. Wolfe and G. Zoutendijk. 

As far as the author is informed, the termination of the multiplex 
method of Frisch is still not established, even in case of nondegeneracy. 
Of course, this is a largely theoretical problem since nontermination is, 
in any case, highly improbable. In this paper, two sets of pivot selection 
rules are presented, which allow a termination proof. They may be re- 
garded as a "primal" and a "dual" method. 
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The Simp/ex Method for Loco/ Separable Programming 



Clair E. Millert 

FUNCTIONS OF A SINGLE VARIABLE 

An important problem in mathematical programming is to generalize 
the simplex method to solve certain nonlinear programming problems 
[1, 6, 10]. This paper describes such a generalization, to a class of prob- 
lems known as separable; i.e., ones in which the only nonlinear functions 
allowed are functions of a single variable. The method has been pro- 
grammed and has been in productive use for over two years. 

The basic idea is to use a representation of polygonal functions in 
terms of linear equations coupled with logical restrictions and to employ 
a modified simplex method which enforces the required logical re- 
strictions, These ideas are implicit in the work of many authors, notably 
Dantzig [2, 3], Charnes and Cooper [1], and Manne and Markowitz [8]. 

Polygonal Approximation 

Suppose a function f of a single variable is replaced by a piecewise 
linear approximation. Then there are finitely many points PI = (a^ f(a 
= (ai, bj) on the graph of f(x) and linear interpolation between adjacent 
points will approximate f satisfactorily. An example is shown in Fig. 1. 
This relation between y and x can be described by introducing variables 

x , x t , . . . , Xfe with xj > i = 0, . . . , k 
and requiring that 

i -; si ( 



x = ai Si (2) 

x (3) 



tThis work was done while the author was with Standard Oil of 
California. 
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No more than two 5q can be nonzero and these must be 
consecutive. 

Except for (4), the system of equations would be purely linear. The 
modified simplex method enforces this requirement. 



u 
1.0 - 



0.8 



0.6 



0.4 



0.2 



(4) 




0.2 0.4 0.6 0.8 1.0 

Fig. 1. The approximation to u = x 2 . 



The Algorithm 



Data are given as in a linear problem, with the addition that certain 
ats of variables are designated as "special." There is one set, 
S = (S^, *j, . . . , %), for each nonlinear function y = f(x) (see Fig. 2). The 
simplex algorithm is modified to inhibit pricing (calculation of the re- 
duced cost coefficients) of the special variables within each set, as 
follows: 

a. if no element of S is in the basis, then all of S will be allowed 
for pricing. (This can occur only when artificial variables are in 
the has Is.) 

b. If precisely one element of S is in the basis, then only the variable 
{if any) immediately preceding it and the variable (if any) im- 
mediately following it within S are allowed for pricing. 

c. If two variables from S are in the basis, then no others from S 
shall be allowed for pricing. 

For separable programming the ordinary calculations of the simplex 
algorithm are carried out except that the new rules are independently 
applied at each iteration to each of the sets of special variables. These 
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Fig. 2. The constraint matrix for the example problem. 

enforce the requirement that within each such set no more than two of the 
special variables can be nonzero and these two must be adjacent. 



An Example 

To illustrate, consider the problem of maximizing z, subject to: 

x 2 +y 2 +z 2 = l 
Restated, (5) becomes: 

u + v + w = 1 
u = x 2 



y 2 



(5) 

(6) 
(7.1) 
(7.2) 
(7.3) 



Replacing (7.1) by a polygonal approximation gives: 

1 = x + xj + + x s 
x = Ox + .2x~! + - * + 1.0x~5 
u = Ox + .04xi + + 1.0x 5 < 8 

and similar sets of equations, and special variables, for (7.2) and (7.3). 
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Th constraint matrix for the example is shown in Fig. 2. Note that 
it has three sets of special variables 



y* yi> > y 

Z Zj, . . . , Zg 

one for each of the nonlinear equations (7.1), (7.2), and (7.3). Evidently, 
then, these parabolas have been replaced by polygonal functions (Fig. 1), 
and these in turn induce a replacement of the original constraint set (the 
surface of a sphere) by a piece wise linear approximation to it. A sketch 
of the resulting constraint set is shown in Fig. 3. 

This example was computed using a standard linear programming code 
modified to do separable programming. It terminated Phase I with the 
Initial feasible solution, Q I6 , shown in Fig. 3. ft took eight more iterations 
in Phase II, stepping along the path indicated in Fig. 3, to reach the 
optimal solution, the north pole of the sphere. 

Optimally and Termination 

To have a workable algorithm, it must be shown that the process 
terminates after a finite number of iterations and that the terminal solu- 
tion is locally optimal. Since the process presented here is a simplex 
method, the same reasoning applies as in the proofs of these facts for the 
ordinary simplex algorithm [4, 5], This reasoning is sketched below. 

At each step the objective function increases (barring degeneracy) and, 
therefore, a basis cannot reappear, once having been used. Since there 
ar only finitely many bases, the process terminates. Cycling theo- 
retically can occur around degenerate bases but, as in the ordinary simplex 
method, it hasn't been found to happen. It can be prevented by the use of 
ksKioographic ordering (i.e., the so-called e method of Charnes or 
Dtateig, Orden and Wolfe). 

The terminal solution will be an optimum solution in a local sense. 
That ia, no other feasible solution sufficiently close to it will have a better 
objective value. If the problem possesses more than one local optimum 
solution there is no guarantee that the separable programming process 
will select the best among them. But for a large class of problems, in- 
cluding linear and convex problems, there is only one local optimum, and 
the separable programming process will find it. 

Consider the terminal solution and examine a particular set of special 
variables S * ^ . . , , x k ). la view of equations (1) there must be at least 
oae element of S in the basis. Three cases can arise: 

Case 1 One (say xj) of S is basic. Necessarily 5q = I. 

C*je 2 Two (say Xj, x l+1 ) of S are basic, and Sq *0, x i+1 *0. 

Cae 3 Two <5q, x^j) of S are basic but one of them is zero. 
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Fig. 3. The approximate sphere of the example problem. Here Q 16 , . . - , QM 
" is the sequence of basic solutions through which the algorithm 
moved in Phase II. Subscripts are iteration numbers. 

If Case 3 doesn't occur, look at the terminal solution again and take any 
other nearby feasible solution. Express this solution in terms of the 
variables of the problem using only those specialvariables which have 
just been priced. Specifically, if Case 2 occurs (34, x 1+i basic), express 
the nearby solution using only x { and x 1+1 -i.e., stay between PI and P l+1 
on the graph of y = f(x), Fig. 1. If Case 1 occurs_stay between P IM and P t or 
Pi and Pi.t, using only SM and 34 or 34 and x 1+1 . But all of these 
special variables were already priced at the last simplex iteration aad 
found to have disadvantageous reduced cost coefficients. Hence, evaluating 
the nearby feasible solution via the reduced objective functional shows it 
to have a less desirable (at any rate, no better) objective value than the 
terminal one. So the terminal one is a local optimum. 

If Case 3 should occur, the algorithm may have terminated on_a solution 
whicn is not a local optimum. However, Case 3, with Xl - or x i+ i 0, is 
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a degenerate situation and rare. And, as the example in the next section 
illustrates, such a solution is the limiting case of a truly local optimum 
one, in an appropriate sense. 

An Example of Degeneracy 

The following example illustrates how degeneracy among the special 
variables can cause the algorithm to stop with a nonoptimum solution. It 
also shows that such a case is truly singular and not of practical compu- 
tational significance. 

Consider the problem: 

Maximize y, subject to 

-x + y + s =0 

X ~4 XQ ~ XJ - 3X 2 = 



x + Xj + x 2 = 1 (9) 

IB this problem there is one set of special variables, S = (x , x v x 2 ). It is 
a two dimensional problem, as sketched in Fig. 4. 

The basis x, y, x^, Xj is the trouble maker. The canonical representa- 
tion of the problem relative to the basis is 

y + 2s - 3x 2 = 1 

x + s - 3x 2 = 1 

x^ ~ 2s + 2x 2 = 

x t + 2s - x 2 = 1 (10) 

and it is degenerate because x enters the solution at level 0. 

This basis appears to the algorithm to be locally optimal since the 
only variable with negative cost coefficient, x 2 , is not allowed for 
pricing. Thus the algorithm terminates erroneously at this solution (the 
true solution is evidently at x = 3, y = 2). 

The trouble with this basis is that the vertex (1,1) is exactly on the 
cooBfcrmJjftiag line y = x. Had the vertex been below the line the basis 
wowkfcaH even have been feasible. Had it been above the line the basis 
wnmld have been feasible, non-degenerate, and truly a local optimum. 

Useful Modeling Devices 

The algorithm does not directly take into account the structure of the 
special variables, equations (1), (2), and (3). This may be used to ad- 
vanUge in several ways. For example, if one replaces (1) by 
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y=x 




(3,2) 



(0,0) (iQ) 

Fig. 4. A degenerate example. The feasible set is the polygonal line 
(1*0), (1D (3,2) and the optimal solution is (3,2). 

with the points (x^y^ vertices of the graph of f<x), then the constraint be- 
ing enforced among x, y, and z is 

y = z f (x/z) 

This is the most general section-wise linear homogeneous function of two 
variables, x and z, that one can have. 

Another device is to use the same set of special variables in a multiple 
fashion. If, for example, one has two functions 

y = f (x) 
v =g(x) 

of the same argument, x, then the same set of special variables will work 
for both functions. One appends to (1), (2), and (3) the equation 

fc 

Ej where dj = g(xi). 



This can be extended to several functions of x. 

Many functions of several variables can be separated fay appropriate 
change of variable and thus placed in separable programming format. 
For example, a quadratic form can always be diagonalized, which 
separates the form after a linear change of variables. As a special case of 
this, one can separate a product of two variables. For example, the ex- 
pression 



w = uv 
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can be separated by the substitutions 

w^tx'-y 2 ), u=x+y, v=x-y 
Clearly this expression can also be separated by taking logarithms. 

Limitations of th* Method 

Several features of the method stand out as limiting its range of appli- 
cation. 90 that it cannot be called a general -purpose nonlinear programming 
procedure. Often the constraint set is not convex, so that there may be 
several extremal or locally optimum solutions. There is, in general, no 
way to show that the particular solution produced by the algorithm is a 
global optimum. Ordinarily one has smooth nonlinear functions, and the 
polygonal functions are merely approximations to them. The resulting 
polygonal model can in some cases have local optima which are not good 
approximations to local optima of the underlying smooth model, even 
though the objective value is nearly optimum. Estimating the extent of such 
discrepancies is a difficult problem, intimately related to an exhaustive 
sensitivity analysis of the model. The previously mentioned tricks can 
increase model size substantially and thus restrict the size of problem 
which can be computed economically. The number of simplex steps to 
optimum is increased by the use of many sets of special variables, and is 
further increased by the use of fine grids in the polygonal approximations. 
In extreme cases machine time has been four times what one would expect 
of a purely linear problem with the same size constraint matrix. Within 
these limits, however, the method has proved to be a dependable and 
effective technique for coping with many nonlinearities. It has been in 
productive use for over two years . 

GENERALIZATIONS TO MULTI-DIMENSIONAL FUNCTIONS 

The ideas just described extend quite readily to functions of several 
variables 



j ia real, but x is a variable in m-dimensional space. With some 
modifications, to be discussed, this extension appears to be a workable 
method &r solving general nonlinear problems, provided that the dimen- 
sion of x is oot large. 

1 is accessary to introduce the notion of a triangulation, the standard 
topological tool lor defining piecewise linear functions. A triangulation of 
a region, E, of m-dimensional space is merely a partition of the region 
o m-diraeosiooal simpiexes, the latter being all convex combinations of 
ra^l points or vertices a, a^ . . . , a m in m-dimensional space. The 
aimplex in said to be spanned by its vertices. This isn't a complete 
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definition of triangulation, but it will do for present purposes by noting 
the following two additional properties: 

A real -valued function on R is piecewise linear if it is linear over 
every simplex of the triangulation. 

Simplexes of lower dimension than m are allowed. In fact every 
face of a simplex (a face is a lower dimensional simplex spanned by 
any subset of the vertices of the original simplex) is required to belong 
to the triangulation. 

Returning to the nonlinear programming problem, let y = f(x) be a 
real valued function of an m -dimensional variable, x, and suppose one has 
a triangulation on the domain of x, with vertices a$, a lt . . . , a^. Then the 
relations (1), (2), (3), and (4) define a piecewise linear approximation to 
the function with minor changes in definitions. Equation (2) must be inter- 
preted as an m -dimensional vector equation, ft is equivalent to m con- 
straint equations, one for each coordinate of x. Condition (4) becomes: 

No more than m+1 of x , Xj, , x^ can be nonzero, and these must 
belong to vertices which span a simplex in the triangulation. (11) 

This representation of the function y = f(x) gives rise to a natural 
generalization of the criterion for selecting variables for pricing. 

Let S be a set of special variables and SB C S those which are 
basic. Then if x~i is in S, but not basic* 5q shall be allowed 
for pricing if and only if the vertices associated with Sg and 
5q span a simplex in the triangulation. (12) 

The algorithm discussed earlier is evidently this m-dimensional algo- 
rithm particularized to m = 1. 

This statement of the m-dimensional algorithm is deceptively simple. 
The difficulty is that a general triangulation is cumbersome to deal with 
and gives rise to awkward computer programming problems. It does not 
appear to be feasible to program a completely general algorithm, with the 
nature of the triangulation unspecified. It does seem feasible to handle 
a standard triangulation, though it hasn't been done. A reasonable ap- 
proach is described in the next section. 

Cubical Triangulations 

The triangulation of the m-cube described in [7] appears attractive as a 
standard triangulation on which to base a computer code. The vertices are 
lattice points in m-space (points whose coordinates are integers) in the 
domain of definition of the nonlinear function y = f(x). By choosing the unit 
of length in each coordinate one can control the fineness of the triangulation. 
A sequence of vertices 
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span a simplex in this triangulation if and only if they are vertices of a 
unit cube, and they can be ordered so that 



(the iaequality to be interpreted coordinate-wise). This condition can be 
restated as 

ap-isa* * <a p ==ao + I (14) 

where I = (1, 1 ..... 1). 

The criterion (12) requires that all vertices be generated which can be 
adjoined to a given basic simplex (13), and still get an allowed simplex. 
This can be done, since a new vertex, to be admissible, must be equal to 
ap - I or ao * I (if a$ + I *ap), or else must be insertable somewhere into 
the inequality sequence (14). Two adjacent terms, say aj and ai +1 in this 
sequence have integer coordinates differing by either 1 or 0. If the 
number of coordinates which differ by 1 is q, then there are exactly 
2^-2 readily generated vertices between aj and aj +1 . 

Column Generation 

The method for one dimensional problems calls for pre -calculating the 
columns of the constraint matrix associated with each special variable, 
xj. Since there is one 5q for each vertex a}, this leads to a very large set 
of columns, most of which are in fact passed over during the computation. 
For example, a function of 3 variables with each coordinate partitioned 
into 10 divisions gives rise to 10 x 10 x 10 = 1000 vertices. 

This can be avoided by using the revised simplex method and generating 
each column when it is needed for pricing. This is easy to mechanize, since 
the only nonzero entries are the coordinates of the vertex, a^ the function 
evaluated at ai, f(a|), and 1. This requires a subroutine whose arguments 
are the coordinates of ai and whose result is the desired column. 

Grid Refinement 

ft is natural that a fine grid, or triangulation, will result in more 
Iterations to reach optimum than a coarse grid. This is borne out by ex- 
perience in the one dimensional case. Therefore a grid refinement pro- 
etee seems in order, in which one iteratively solves the nonlinear prob- 
lem, each time using a grid which is a refinement of the preceding one and 
using the old ofrtimal solution as a starting point for the new problem. 

The steps lavolved in this procedure are: 

Step 1. Solve the problem initially, using a coarse grid for each non- 
lioear function. 

Step 2. Refine the grid on each nonlinear function. 

Step 3. For each nonlinear function, y = f(x), examine the old optimal 
valu, x , of x. Find the simplex in the refined grid in which x lies. 
Heord its vertices. 
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Step 4. Re-solve the problem using the refined grid, and starting the 
problem with the old optimal basis, as modified by Step 3 for each non- 
linear function (i.e., for each set of special variables). 

Step 5. If each grid is not satisfactorily fine, return to Step 2, 

Two points of caution need to be observed in applying grid refinement. 
First, it is important that the new grid be a literal refinement of the old; 
otherwise Step 3 will, in general, generate a simplex of higher dimension 
than its predecessor, and this will not yield a basis for Step 4. Second, 
the basis in Step 4,will, in general, turn out to be infeasible. Thus the 
underlying simplex algorithm must be able to cope with negative solution 
values. The Orchard -Hays composite algorithm [9] is designed to do this. 
However, to work efficiently with it, criterion (12) should be modified so 
that 83 is the set of basic variables whose solution value is nonnegative. 

Although this procedure has not been automated, several experiments 
have been made to simulate it, using the separable codes discussed in 
Part I. This evidence suggests that an effective method of reducing total 
machine time is to start at Step 1 with very coarse grids . 

Open Mathematical Questions 

The problem of degeneracy has not been satisfactorily solved, even 
though it has been found to be an insignificant operational problem. The 
problem is that if the algorithm terminates with a solution in which one or 
more of the basic special variables has value zero, there is no guarantee 
that the solution is locally optimal. Do recovery procedures exist which 
can be employed to continue in such circumstances ? 

Another problem, which is probably more fundamental, stems from the 
fact that the algorithm addresses itself to an approximate programming 
problem, rather than the original programming problem. What can one 
say about the quality of the solution obtained in this way ? This latter 
problem can be put in better perspective by referring to the grid refine- 
ment procedure described above. In it, one finds exactly, barring de- 
generacy, a locally optimal solution V(j), to a programming problem 
Pfl). One then uses V(j) as an aid in finding a solution V(j+l) to P(j^l). 
What can one say about the sequence V(0), V(l), . . , , V(j), . . . ? Does it 
converge at all? If so, is its limit a locally optimal solution to P(ty> the 
underlying problem which P(j) approximates ? Certainly P(j) converges 
to P(co) in an appropriate sense. More generally one would like to have 
some way of knowing that when PQ) is a good approximation to P{) then 
V(j) will be a good approximation to some locally optimum solution to 
PH. 
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MINIMIZATION OF INDEFINITE QUADRATIC FUNCTIONS 

WITH LINEAR CONSTRAINTS 



Alex Orden 

ABSTRACT 

In the classical problem of minimization of a quadratic function subject 
to linear equations it is useful to introduce a basis for the null-space of the 
constraint matrix. The result provides a computation scheme and neces- 
sary and sufficient conditions for a minimum. 

Let $ = PX + XQX be a quadratic function, of a real n-term vector X, 
where P is a given vector and Q a given symmetric matrix. is to be 
minimized under the constraints AX = B where A is a given rectangular 
matrix and B a given vector. 

Let N be a matrix whose columns span the null -space of A. An N 
matrix can easily be computed as part of a Gauss-Jordan reduction on A. 

Necessary and sufficient conditions for existence of a minimum of $ 
are that both of the following hold: 

(a) That the linear system of equations 

2N T QX = -N T P 
AX = B 

be solvable for X. 

(b) That the matrix (N T QN) be nonnegative (positive definite or positive 
semi -definite). 

When both conditions hold, any solution of (a) is an X which minimizes 

<t>- 

Q may be an indefinite matrix. When Q is a nonnegative matrix, condi- 
tion (b) always holds, i.e., Q nonnegative implies that (N T QN) is non- 
negative. 

Lagrange multiplier type equations may be used in place of (a) . In place 
of the above we can write: 

Necessary and sufficient conditions for existence of a minimum are that 
both of the following hold: 

(a) That the linear system of equations 

2QX + A T W = -P 
AX = B 
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where W is a vector of Lagrange multipliers associated with the 
constraints, be solvable for X and W 
<b) That (N T QN) be nonnegative. 



Linear Programming under Uncertainty 



Albert Madansky 

This paper will describe some attempts at introducing "uncertainty," 
or, to be more precise, "risk" into the linear programming model. The 
distinction between "risk" and "uncertainty" is that in a risky situation 
one knows completely the probability distribution of the random variables, 
whereas in an uncertain situation one might know the probability distribu- 
tion except for, say, a parameter. The terminology in this area has grown 
up out of a paper by Dantzig [3], entitled "Linear Programming under 
Uncertainty," but what was really meant was "linear programming under 
risk." In any event, we shall assume that the probability distributions of 
any random variables which are introduced into the problem are com- 
pletely known. 

Let us take as the standard linear programming problem one given by 
constraints Ax ^ b, x ^ 0, and where C T X is to be minimized. Here b is 
an m-vector, x and c are n 4 -vectors, and A is an m x n 4 matrix. The 
introduction of risk into the problem can occur in either the coefficients 
of the objective function, or in the constraints either the right-hand side, 
b, or the matrix A, or both. These two situations are clearly distinct. 



THE STOCHASTIC OBJECTIVE FUNCTION 

There has been some work done on the problem of introducing risk into 
just the objective function. In this problem, the optimum vector x lies in 
the convex polyhedron defined by the inequalities Ax ^ b, x ^ 0, and the 
problem is just one of trying to find a vector in this polyhedron which min- 
imizes an appropriate objective function for the risky situation, The appro- 
priate objective function, as is well know, is obtained as follows: Consider 
the utility of each possible value of the objective function, that is, the 
utility of c'x for each possible c. Then take the expected value of the 
utility of c'x over the distribution of the random vector c as the objective 
function to be minimized. If the utility function is linear in the objective 
function, then the problem reduces to one of merely looking at the inner 
product of the expected value of c with x, and this now becomes a non- 
stochastic linear programming problem. But whatever the nature of the 
utility function, the problem has been converted to one which is non- 
stochastic. 
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In practice, the solution of this nonstochastic program may be very 
difficult to obtain, as it depends on the nature of one's utility function. 
Only one problem with nonlinear utilities has been studied in the literature. 
This is ia a paper by Freund [6], studying a maximization problem in which 
the utility function w^s 1 -exp(-ac f x) where the vector c had a multivar- 
iate normal distribution. For such a problem, maximizing the expected 
utility of the objective function reduced to a quadratic programming 
problem. 



STOCHASTIC CONSTRAINTS: THE 'TAT" FORMULATION 

A problem of quite different nature arises when one introduces risk into 
either the matrix A or the right-hand side b. In this situation one is con- 
fronted with the question of how to carry over the notion of feasibility, 
inherent in linear programming, to a " linear program" whose matrix and 
right-hand side are random variables. The various formulations in this 
area have been cttrected at different ways of answering this question. The 
simplest answer is embodied in the so-called "fat" formulation, charac- 
terised by the following reasoning: The decision maker has to decide on 
some vector x of activities before he can observe the values of A and b. 
After be has made his choice, he is confronted with particular A and b 
and can Bee whether or not x has satisfied the constraints. The difficulty, 
though, ia that his prechosen x may not be feasible for the observed A 
mad b. What the "fat" formulation prescribes is that one restrict oneself 
to the convex set of those x which are feasible no matter what values of A 
and b will subsequently be observed. That is, one looks at the intersection 
for all A and b of the polyhedra given by the constraints Ax > b, x > 0. The 
problem for the decision maker is one of finding the x in this set 
S * " {x| x > 0, Ax > b}, the "permanently feasible" set, such that 

c*x is minimum. 

It is easy to characterize an optimal solution for the "fat" formulation 
[8]. If x belongs to S, and is optimal for any particular programming 
problem Ax b, x 5= 0, c'x = min for some possible value of A and of b, 
then x IB alao optimal for the "fat" formulation. One difficulty with this 
formulation is that it may not lead to a decision because this permanently 
teasibte set S may be empty. In problems in which the probability distribu- 
ttoo of either A or b is defined over the entire real line, that event is 
tifcaly. In this case the "fet" formulation is not going to help us even to 
the extent of presenting a problem which we can try to solve. But this 
formulatloei has been taken and is a point of view marked by extreme pes- 
simism, and characterised by the fact that one wants to preserve feasi- 
bility no matter what occurs. A variant of this formulation, which may not 
preserve feasibility, is the modification of the "fat" formulation requiring 
that one oaiy be 100P% sure of being feasible. Then one would look at the 
set of Qoaaegative x's such that the probability that Ax ^ b is P and 
minimize c'x for x in this set. 
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THE "SLACK" FORMULATION 

A more realistic statement of the problem is what we may call the 
* 'slack" formulation. It involves converting the problem to a two -stage 
problem, which can be described roughly as follows: The decision maker 
is supposed to choose a nonnegative x, then observe a value of the random 
matrix A and the random vector b, and finally compare Ax with b. The 
vector x may or may not be feasible. But whether feasible or not, we are 
going to allow the decision maker after the fact to make another decision 
y to compensate for any discrepancies between Ax and b, based on his 
original decision x and the later-observed A and b, but at a penalty cost. 

The linear inventory problem is an example of this kind. Here x is the 
amount of inventory which the storekeeper must have on hand, b is the 
later-to-be-observed random demand, A is a nonrandom matrix of relevant 
technology coefficients, and y is the second -stage decision, embodying two 
kinds of activities. If the demand exceeds the inventory, the storekeeper 
must go out on the open market and at a penalty cost buy goods to take care 
of the excess of demand over supply. If the inventory exceeds the demand, 
then he will have to scrap the excess at a penalty, reflecting the loss to 
him due to not having made a better choice of x. This is a more realistic 
way of looking at the problem than the *'fat" solution, in that it keeps the 
decision maker in business after he has made his choice of x and the ran- 
dom variables have been observed. The constraints for the two -stage prob- 
lem are given by 

Ax + By = b 

(We include, in the n 2 -vector y, enough slack so that the inequality con- 
straints Ax ^ b are equalities.) Typically the m x n 2 matrix B is going 
to be a matrix of zeros and plus or minus ones. We require that x and y 
be nonnegative. 

The objective function for this two-stage problem is constructed as fol- 
lows. Let f be the nonrandom penalty cost vector for the second-stage 
decision vector y. For given A, b, and x, we find the best second-stage 
decision, that is, the y which is optimal for 

By = b - Ax 
y 2:0 
f r y = min 

Now, assuming that the utility of the objective function is linear, the appro- 
priate objective for the two-stage problem is c'x * E mm f y. 

It is also assumed that for every possible x and (A,b) there exists a y 
which will compensate for any discrepancy between Ax and b, given that 
one has made the decision x and observed the particular (A, b}. Rather 
than viewing this as an assumption, it may be taken as a definition of the 
domain of the vectors x which are admissible for consideration in mini- 
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mizfag this objective function. This "slack" formulation, it should be 
noted, reduces to the "fat" formulation in case any component of f is in- 
finite -in case it costs so much to compensate for certain types of discrep- 
ancies that one does not want to do so and therefore wants to be perma- 
nently feasible in the first stage. The above assumption, then, is the 
counterpart of the assumption in the "fat" formulation that S be nonempty. 

Work on the solution of the "slack 7 ' formulation of the problem has 
been restricted to the case where only b is random. The case where A is 
also raadom is much harder. One direction of effort has been the search 
f or a certainty equivalent, that is, a nonrandom vector with which one can 
replace the random vector b so that the solution of the resulting nonrandom 
problem will also be the solution of the two -stage problem. It is easy to 
see in simple examples that the expected value of b is not always a cer- 
tainty equivalent, but there are some situations under which it is. One 
such circumstance is as follows: Let C<b,x) = c'x + minf ? y. It is shown 
in {7] that if C(b,x) has the form y 



C(b,x) = Aj(x) + A 2 (b) + A 3 (x)b 

then replacing b by its expected value and solving that nonstochastic prob- 
lem will yield the solution of the two -stage problem. An example is a 
function which is quadratic in both x and b. Further, when the components 
of the vector b are each independent and have uniform distributions over 
some finite range, then the part of the function EC(b,x) which is essential 
in the minimization is under fairly wide circumstances going to be of this 
quadratic nature [ll, and the expected value solution will be the solution of 
this problem. 



CHANCE CONSTRAINTS 

Another formulation of the problem is as a "chance -cons trained" pro- 
gram. One looks at each of the constraining equations of the original 
problem 

Ax s=b 
x 2=0 
c f x min 

and specifies for each constraint a probability with which one wants this 
constraint to be achieved -whence the name "chance -cons trained." Now, 
subject to these probabilistic constraints, one wishes to minimize c'x. 
THIS is reminiscent of the aforementioned variant of the "fat" formulation, 
except tfcat one is here explicit about the probabilities of each possibU in- 
feasifoility. 

Tbe difference between the chance -constrained formulation and the 
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"slack" formulation is that in the latter the specific contingency plans of 
the decision maker for each possible infeasibility are explicitly spelled 
out, as are the explicit costs for all the possible infeasibilities, whereas 
in the former these explicit costs of the various types of infeasibility are 
reflected in the probabilities associated with each of the constraints. If a 
violation of a particular constraint is going to be costly, in the "slack" 
formulation one would have to think hard about what the actual costs of the 
specific contingency plan under infeasibility would be, whereas in the 
chance-constrained formulation one might say: If violation of this con- 
straint is going to be very costly, I want to be 99% sure of satisfying this 
constraint. 

SOLVING TWO -STAGE PROBLEMS 

Aside from the search for certainty equivalents for the two -stage 
problem, there has been some research on obtaining algorithms for mini- 
mizing EC(b,x). This work is contained in [4J, based on the following con- 
siderations. Part of the two -stage problem is the second-stage problem, 
the problem that the decision maker has once he has made the initial de- 
cision x and observed the random vector b. This problem has the form: 

By = b Ax 

y >0 

f T y = min 

Now for this problem, for given b and x, there is an optimal set of prices, 
or dual variables, 7f(b,x). It turns out that one way of characterizing the 
solution of the two -stage problem is in terms of the expected optimal 
price of the second-stage problem. 

More specifically, the following three results are what led to the algo- 
rithms of [4]. 

I. Suppose x is the optimal first-stage decision, i.e., it minimizes 
EC(b,x) and satisfied the constraints, and let x t be feasible. Then 

[C T -Eir'fax^AJx ^ [c f -EFOvqHxi 



that is, given any other feasible vector x t , the optimum x for the two- 
stage problem provides a smaller value than does x t for the linear form 
[c r -E^ r (b,Xi)A]x. 

This gives an inkling into a way of proceeding. One would hope to gen- 
erate linear forms based on a particular choice of x t and an evaluation of 
the expected optimal price for the second-stage problem given x t , and 
then determine whether there exists a vector which makes the above- 
mentioned linear form smaller than when evaluated at x t , 

H. EC(b,x) is convex in x. In other words, the "slack" formulation of 
the problem is in reality a recasting of the problem as a convex program- 
ming problem. Unfortunately the function EC(b,x), though convex, is not 
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necessarily differentiate everywhere in the interior of the region of defi- 
nltion of x, so one cannot just take derivatives and set them equal to zero 
in order to find solutions. On the other hand, one can construct the sup- 
ports to this convex function in terms of the expected optimal prices for 
the second-stage problem. 

HI. The plane given by [c f - E* ' (b,x t )A]x * Eir^b.XjJb is a support to 
E(b,x) at x x t . That is, the term c f - Eif f (b,Xi)A behaves as the gra- 
dient of this convex function at Xj. 

What we have is a combination of two results: one a result about the 
convexity of C(b,x) and a characterization of the support planes, and the 
other a necessary condition for optimality of x. Using these in conjunction, 
algorithms for minimizing this convex function can be developed. These 
are given in some detail in [4]. 

Another direction of effort has been in determining optimizing algo- 
rithms for the case in which the vector b takes on only a finite number of 
possible values. In that case, the problem can be written out in full as the 
following large linear programming problem: 

Ax + By t = bi 

Ax + By 2 = bj 

Ax + By N = DN 

c'x + Pl f f yi + Ptf 'yj + ' + PN f T yN = b N 

where the p's are now the probabilities of the various b's. In this format, 
one solves not just for optimal x, but also for the whole set of optimal con- 
tingency plans, y lt . . . , y N , whereas in the formulation as a convex pro- 
gramming problem one is only interested in determining the optimum 
first-stage decision. (The reason for this is that the decision maker is not 
directly interested in obtaining the entire set of contingency plans. The 
decision in the second stage depends only on the value of b that is ob- 
served, as well as on his first-stage decision x, and so the decision maker 
has a simple task. He doesn't care about all the possible situations that 
might have arisen. Bat in this way of setting up the finite problem one ob- 
tains ail the contingency plans y lf . . . , yjsf, as well as the first-stage op- 
timal decision.) The dual of this large scale program can be seen to be in 
suitable form for use of the decomposition algorithm [5], so that, though 
the linear program is a large-scale program, it is now feasible to handle 
it oe a computer. 



OTAUTY THEOEEMS 

Aao*feer direction of work in this area has been in obtaining duality 
theorems for the "slack" formulation of the problem. We briefly sketch 
wfaa* fa** been done in W. I*t D - (A,B), e' = (c'.f '), |'<b) = <x',y'(b)), and 
be the distribution function of b. Let {^(b)} denote a collection of 
{ft*, ladexed on b, where all the members {(b) of a collection 
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{|(b)} have the same first n t components. The primal problem then 
becomes: 



| <b)' 2= all b 
JV|(b) djn(b) =min. 
One can then write out the dual problem: 

D ? 6(b) ^e 
6(b) > 

Jb T d(b) djit(b) =max. 

Consider any particular collection (|(b)} now as the tabulation of a 
function of b. Let {6(b)} be a collection of dual vectors 6{b>, indexed on 
b, and view any particular collection now also as the tabulation of a func- 
tion <5 of b. Assume that the functions and 6 are measurable and square 
integrable with respect top, and that the squared length of b is measur- 
able and integrable with respect to JA . Then the Lagrangian for the problem 
is 



One then obtains the usual kind of result, that 



if and only if and 6 are optimal for the primal and dual problems, re- 
spectively. Now these problems are not the original primal and dual 
problems, but have been restricted to those involving and 6 which are 
measurable and square integrable functions of b and for which the addi- 
tional assumption is made that, roughly speaking, the distribution of b 
have finite variance. These certainly seem to be reasonable restrictions to 
place on any problems which will occur in practice, 



RELATED WORK 

We will finally describe briefly the work of Tintner and his school [10]. 
They are not directly concerned with the problem of decision making 
under risk. Rather, they are interested in such questions of the form: 
What is the distribution or at least what are the expected value and vari- 
anceof the objective function if one were to "wait-and-see" the value of 
the random A and b, and then solve that nonrandom problem? Explicit 
analytic results for this problem would be quite useful for the "slack" 
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formulation, for they might enable one to find an analytic expression of 
E min f *y as a function of x. However, their work has shown that analytic 

expressions for particular distributions are hard to come by. 
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A PRIMAL-DUAL ALGORITHM FOR CONVEX PROGRAMMING 

Robert Wilson 

ABSTRACT 

This paper develops an algorithm for exact solution of a broad class of 
practical convex programming problems, including, for example, stochastic 
linear programming with convex losses, and quadratic programming. The 
algorithm finds a vector x* yielding the minimal value of f(x) = c'x + g(Ax) 
within a bounded set of linear constraints, x ^ and Bx ^ b, where b and 
c are vectors, A and B are matrices, and g(-) is a convex, continuously 
differentiable function of the components of a = Ax. Since A might include 
the identity as a submatrix, g( ) could be a function of x directly. 

The Kuhn-Tucker conditions for a solution are analyzed to identify a 
natural dual problem for which the feasible subspace is a convex poly- 
hedron. A mapping between the dual and primal spaces, used together with 
iterations of the ordinary simplex method enable one to proceed to the so- 
lution. Cycling in the simplex algorithm between two vertices of the dual 
feasible subspace identifies a nonvertex solution, and in this case a simple 
parametric technique suffices to yield the solution immediately. The 
method is illustrated by solving a stochastic linear programming problem. 

The algorithm takes its easiest form when g( - ) is a separable function 
of the components of a s Ax, due to resulting simplicities in the mapping 
process. When g( ) = the algorithm reduces to the simplex method 
applied to linear programming. 

Extending the analysis, it is shown that the method is applicable also to 
sequential convex programming problems. Although the numerical task 
now becomes burdensome, the method is illustrated on a sequential 
stochastic linear programming problem involving inventories (or back- 
orders) carried over from period to period. 



Ill 



Characterizations by Chance -consfrained Programming 



6. L Thompson 
W. W. Cooper 
A. Charnes 

It is useful to think of the models referring to stochastic programming, 
linear programming under "risk" (it should be, rather than "uncertainty"), 
chance-constrained programming, etc., as originating in the problem: 

Max C T X Min w T b 

Ax ^ b w T A 2: C T 

x 5: w ^ 

which has the well-known features of duality. The problems arise, of 
course, when the vectors b and c, possibly the matrix A also, involve 
random variations. It is necessary to decide what is to be understood as 
the problem to be solved before asking how the problem is going to be 
solved. The direction we have taken is something like the following: 

First, let us set down one example of the intended class of problems. 
Suppose we want to maximize an expected value, e.g. 

Max E (cT x ) 

subject to chance constraints. These conditions are that the probability 
that certain inequalities are satisfied is at least a. This is only a partial 
prescription. We also require that the variables, the quantities x, arise 
from some class of decision rules which might depend on the A's, the c's 
and the b's, where A = (ay), b T = (bj, . . .b^), C T = (c^ . . . c n ). Thus our 
chance -constraints and decision rules may be written: 

P{Ax < b} 2= a 
x = D(A, c, b) 

We are not merely trying to find mixed strategies. We are rather inter- 
ested in determining decision rules which will tell us assuming that this 
is a problem that marches forward in time what to do at each emerging 
stage; not what we should do with a certain probability, as in the case of a 
mixed strategy in game theory, or in the case of a stochastic model of 
Markov process type* where we determine conditional (transition) proba- 
bilities of various actions. To fix tlae ideas one might keep in mind, say, 
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scheduling the production of heating oil to meet the emerging random de- 
mands through the season as well as to meet whatever other constraints 
are relevant on storage and transportation. 

The terminology we use overlaps with that of others. The meanings 
have gotten mixed up as is to be expected. Two terms arise from the for- 
mulation above. The first is "deterministic equivalent"; by this we mean 
a problem not involving any random variables which when solved will give 
us optimal decision rules to use. And we talk of the "certainty equivalent'* 
as being this set of optimal decision rules, because having these, we know 
with certainty what we are to do in any specific case. 

How does the so-called linear programming under uncertainty of Dantzig, 
Madansky, et. al., relate to chance -constrained programming? Much of 
their work is concerned with the so-called two-stage instance of linear 
programming under uncertainty. Referring to our chance -constrained 
formulation, their variables and here we include their (second stage) 
penalty variables -are those that have c's attached to them. Then with 
these, you must restrict to the special case in which the afs are all 1 in 
order to have linear programming under uncertainty. This is true whether 
it is two-stage or k-stage. Further, the concept of decision rules and their 
optimal determination is absent in any explicit sense in the published work 
of this "l.p.u.u." group except in the two stage case, to which we shall 
return. 

We have however been able to get some characterizations of optimal 
classes of decision rules for general l.p.u.u. by restricting our formulation 
to ai * l. For example, in the case of analyticity of the decision rules in 
the random variables it turns out that the class of linear rules is sufficient 
to consider. And piecewise analyticity would yield piecewise linear rules. 
For the general case (o^ x i) we have been able to make a certain amount 
of progress in finding deterministic equivalents. We have looked at this 
chiefly, but not exclusively, in terms of linear decision rules, 



where D is a matrix. If in a dynamic problem, the components of x with 
larger subscripts represent later times, D may have a triangular or a 
block triangular structure. But there are cases when D does not have 
such a structure. Also, if forecasts of some of the random variables are 
included in the decision rules, a forecast could appear as a random variable 
which is available with (perhaps) smaller variance at certain times along 
tbe pathway than the random variable available at an earlier stage. 

I any case, if we look at this class of linear decision rules it is clear 
tbat in order to do something of a general nature with deterministic equiv- 
alents we must rely on a class of distributions for which the algebra is 

i 8 a S ^ y -T VeGie f ' SUCh " CiaSS (su ^ ested bv Charnes and Ben-israel) 
8 tfeat of mixtures of normal distributions. This class has the property 
0*t a linear combination of random variables which are governed by a 
(muitivariate joint) mixture of normal distributions is again governed by 
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such a mixture, and one can specify the corresponding means and covari- 
ances in a fairly reasonable manner. Further, with mixtures of normal 
distributions it is possible to approximate distributions of fairly arbitrary 
shapes U -ness and all that sort of thing can be had. Thus one can build up 
more general theory through approximation by mixtures of this type. 

In this linear case, there is one more point of content. The case of two- 
stage linear programming under uncertainty has a case in which both aj 
= 1 (for all i) and x = Iy where y does not depend on b. In other words x 
is a vector of parameters; the decision rule degenerates so that there are 
no terms involving the random variables b. We call this the zero** 1 order 
linear decision rule. For the more general case with cq * 1, Ben -Israel 
and Charnes have given a dual theorem. For that case, the left-hand side 
in each chance constraint is a number, it isn't a random variable. But the 
probability of a random variable with a known distribution being greater 
than or equal to some particular number, of course specifies a fractile and 
leads immediately out of a chance constraint into a constraint involving the 
corresponding fractile of the distribution. This result holds also for de- 
pendent, not just independent, random variables. 

Three types of functionals for chance -constrained models have come up 
in connection with various problems in the real, or conceptual real, world. 
One of these (pointed out already) is that where the functional is the 
expected value. We call this the E type. Another type (the V type) is that 
in which as for example, in the work of Markowitz on portfolio selection 
and in vestment something like a variance measure of risk is minimized. 
This could be written Min E (c?x - c T x ) 2 . Finally the goal might be to 
maximize the probability that at least a certain level of the functional is 
achieved, again subject to chance constraints and the choice of the decision 
rule from a given class. Call this the P type. 

There is still another class of problem which has come into the social 
science literature through the notion (due to H. A. Simon) of "satisf icing" 
rather than optimizing. But it too can be specified in a certain way in an 
extremal or optimizing manner. 

What do these problems look like for the case in which we have a linear 
decision rule? It turns out that in this case, when the distributions are 
mixtures of normal distributions we can get a deterministic equivalent. 
The results are more general but this level of generality will do. The 
components of b and c may be correlated. There is one more condition: 
if the distributions are symmetric, then we require aj ^ 1/2 for all i. If 
this isn't the case then the deterministic equivalent is not a convex pro- 
gramming problem. But it is interesting that then you really would not 
consider such a chance -constraint as a policy constraint, or as very much 
of a requirement since it would not hold at least 50% of the time . When the 
hypothesis is satisfied (the usual case) the problem is convex and the de- 
terministic constraints will turn out to be at most quadratic. The quadratic 
character arises from the covariances of the components of the random 
vectors involved. 

Incidentally, we have said "convex" programming problem for each 
functional type. That is not true directly for the P type. It leads to a model 
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involving programming over a convex set, but the functional is a linear 
fractional one. We have shown how to reduce this to a model with a linear 
objective, one extra constraint, and one more variable, so it is true that at 
most a second transformation converts all three types to convex program- 
ming models with at most quadratic constraints as their deterministic 
equivalents. 

For example, let us write down the deterministic equivalent for the V 
type. It will be: 

Min E[c T Db - c T x ] 2 

subject to - aTjQ^ b - Vi > -^ 

and - K^.E[bi - a? Db] 2 + vf ^ with Vi ^0 

Solving this problem we would get D and thus would have the certainty 
equivalent; that is, we would be able to specify our action since the corre- 
sponding x components would be precisely determined as specific events 
developed which gave particular values to the b. 

Similar results hold for the other types of models; the same sort of 
structures arise. It follows from the above expected value form of the 
chance constraints that the quadratic character of the equivalent con- 
straints comes essentially from the variances or covariances involved. At 
the very worst, to deal directly with distributions other than mixtures of 
normal distributions, it is only necessary to determine quantities such as 
K?. These may come from a method of parametric variation or from things 
like Chebychev's inequality. This would yield values of K 2 which would be 
higher foan necessary in order that these chance -constraints hold. 

Let us proceed now to something which is a good deal more special 
We ve done some work on extension of the critical path problem to one in- 
volving uncertainty in the times of completion of each task in the network 
of required tasks. Here, consider a network with a starting node, a finishing 
node and various other nodes . A unit amount is sent in which is required to go 
through the network and come out at the finish node in correspondence to 
total completion of all the tasks implied by all the links. Recall that in this 
formulation one searches for the chain of maximal length in the network 

22!?7n ' T imi2e 2jtjXj ' Where ^ iS the time on the 3 th link > *1 ^e 
amount of How there subject to the incidence conditions and unidirectional 
I low, 



Xj 2=0 

ThU of course is associated with a dual problem: 
Ui aj subject to 2 ujey s tj 



CHARACTERIZATIONS BY CHANCE -CONSTRAINED PROGRAMMING 1 17 

Here all the a's except two will be zero. One of these is 1 and the other is 
-1. 

If you look for the moment at the deterministic case in this form, and 
you look for the obvious directed sub-dual method (see Chames and 
Cooper, "Mathematical Methods and Industrial Applications of Linear 
Programming," Volume n), it turns out that it's possible to solve it in one 
pass through. And the method that you get in this manner turns out to be 
one which Dijkstra published in Numerische Mathematik, Volume I, 1959. 
This method seems to be an improvement over previous solution methods. 
It's rather interesting that his ingenious method which is based purely on 
graph considerations is identical with one you get from simply a routine 
examination of the problem and construction of an obvious directed sub-dual 
algorithm in the dual. 

In the dual, Min Si^ai subject to Si^ey > tj, the ui may be considered 
virtual potentials at the nodes. Their optimal values can be interpreted as 
"early start times" for their following tasks. And it is from this side, too, 
that we can take this problem up into chance -constrained form, where we 
replace the deterministic constraints above with the probability that these 
happen is respectively at least /3j. E.g. the constraints become 
P(2iUiy > tj) -Py Here the tj are random variables with known distri- 
butions, and for simplicity we consider zero order linear decision rules 
for the ui's ("two -stage" in the linear programming under uncertainty 
terminology; note this is not l.p.u.u. since the j3j are not necessarily 1) i.e. 
we take the u^'s as parameters to be solved for. Now then we can do some- 
thing like minimize the expected value of the functional, or we can minimize 
the probability that the time taken for completion is more than a certain 
amount. Or we can maximize the probability that the time is less than a 
certain amount for completion. Any one of these criteria would give us a 
perfectly valid chance -constrained zero order model. 

To show the relevance of this to PERT procedure, let us suppose, for 
example, that the t's are independent with distribution functions Fj. Then 
the chance -constraints can be inverted immediately into the fractile form, 

S ^ij * F 

Taking the E form for the functional, it remains as Z^a^ since the ui are 
not random variables. 

Now we have a dual problem: 

Max FJ 1 



subject to J^ ij x j = a i 



and x j 



This is of the same form as the original deterministic critical path formu- 
lation with the FT 1 (j3j) replacing the (fixed) times for task completions. 
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PERT arises for special discrete (3 point) distributions and with replace- 
ment of the random times by their expected value. But this corresponds 
here to the &\ x 1/2. You may form your own conclusions as to the protec- 
tion afforded by this level of /3j in chance constraints. 

Aside from the above considerations, this model has other interesting 
aspects. For example, we have obtained the (Tintner) variety of stochastic 
programming solution for some cases. When one uses exponential distri- 
butions then the distribution of the maximum of two random variables and 
other necessary distributions turn out to be easy to determine. The inte- 
grations involved can be carried out and for some simple examples we can 
actually get the distribution for the minimum total completion time. Al- 
though the mode of individual task times is at zero, the total completion 
time distribution is very flat and small near zero. Further it is often 
multi -modal! This is at variance with certain (fallacious) central limit 
theorem usages. 
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PROGRAMMING WITH STANDARD ERRORS IN THE 
CONSTRAINTS AND THE OBJECTIVE 



S. M . Sinha 

ABSTRACT 

This paper deals with a linear programming problem, where the coeffi- 
cients of the objective, the constraint inequalities and the available quanti- 
ties are random variables. The appropriate formulation under the situation 
is then to consider that our activities should be such that with a certain 
preassigned high probability, the total quantities required for each item 
should not exceed the available quantities and at the same time guarantee 
a maximum objective with a preassigned high probability. With the assump- 
tion that at least the means, variances and covariances of these random 
variables are known, our formulation reduces the stochastic linear pro- 
gramming problem to the case of the following convex programming prob- 
lem: 

Max D T X - (X T BX) 1/2 
subject to AjX + (X'B^ 1 ^ ^ ty, X == 

where D, Ai, X are (n x 1) 

and B, B* are (n x n) 

symmetric positive semi -definite matrices 

(i = 1, 2, . . . , m) (D 

It has been shown that in a particular case, where only the coefficients of 
the objective are random variables, the problem can be stated as 

MaxD T X-(X T BX) V2 
subject to AX ^ b, X ^ 

where b is a (m x 1) 

and A is a (m x n) matrix, 

which can be solved by the available algorithms for quadratic programming 

It is also noted that if all the correlation coefficients are unity, (1) reduces 
to a linear programming problem with known coefficients. 
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INEQUALITIES FOR STOCHASTIC NONLINEAR 
PROGRAMMING PROBLEMS 



0. L, Mangos arian and J. B. Rosen 

ABSTRACT 

The inequalities given by Albert Madansky (Management Science, Vol. 6, 
1960, p. 200) have been generalized to a class of nonlinear programming 
problems via the duality theorems of nonlinear programming* In particular, 
the constraints considered are of the type g(x) + h{y) s: b where the compo- 
nents of the vectors g and h are nonlinear concave functions of their argu- 
ments and satisfy some further restrictions. The right-hand side b is 
subject to a random variation with an expected value Eb. It is desired to 
minimize the expected value of the convex objective function <p(x) +*(y) 
subject to the constraints. If y{x,b) denotes min(<0(x) + *(y)J subject to the 

constraints, then under certain restrictions the following inequalities hold 



Ey(b, x(Eb)) ^minEy(b,x) 2:Eminy(b,x) 2: miny(Eb,x), 

X XX 

where x(Eb) denotes the solution of miny(Eb,x). It is also shown that the 

x 

function miny(x,b) is a convex, continuous function of b and that the some- 
x 

times-sharper upper bound to E miny(b,x) given on p. 201 of Madansky also 

x 
holds if b is defined over a bounded rectangle and has independent elements. 
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Compacf Basis Triangular/zafion for the Simplex Metfiocf 



George B. Dantzig 

Alex Orden was the first to point out that the inverse of the basis in the 
simplex method serves no function except as a means for obtaining the 
representation of the vector entering the basis and for determining the 
new price vector. For this purpose one of the many forms of "substitute 
inverses" (such as the well known product form of the inverse) would do 
just as well and in fact may have certain advantages in computation. 

Harry Markowitz was interested in developing, for a sparse matrix, a 
substitute inverse with as few nonzero entries as possible. He suggested 
several ways to do this approximately. For example, the basis could be 
reduced to triangular form by successively selecting for pivot position that 
row and that column whose product of nonzero entries (excluding the pivot) 
is minimum. He also pointed out that, for bases whose nonzeros appear in 
a band on a staircase about the diagonal, proper selection of pivots could 
result in a compact substitute inverse with no more nonzeros than the 
original basis. 

We shall adopt Markowitz 's suggestion. However, instead of recording 
the successive transformations of one basis to the next in product form, 
we shall show that it is efficient to generate each substitute inverse in 
turn from its predecessor. The substitute inverse remains compact, of 
fixed size. Thus "reinversions" are unnecessary (except in so far as they 
are needed to restore loss of accuracy due to cumulative round -off error). 

The procedure which we shall give can be applied to a general m x m 
basis without special structure. As such, it is probably competitive with 
the standard product form, for it may have all of its advantages and none 
of its disadvantages. With certain matrix structures, moreover, it appears 
to be particularly attractive. 

We -shall focus our remarks on staircase structures. The reader will 
find no difficulty in finding an equally efficient way to compact block- 
angular structures. Letting By be a submatrix of the basis, a basis B 
with staircase structure has, for example, the form: 



fThis research has been partially supported by the Office of Naval Re- 
search under Contract Nonr-222(83) and the National Science Foundation 
Grant No. G21034 with the University of California, Reproduction in whole 
or in part is permitted for any purpose of the United States Government. 
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(1) 



In (2), the marks x, *, and <g) indicate the staircase pattern of nonzero 
entries in the basis-matrix B. The P s is some column of coefficients not 
in the basis. The asterisks along the diagonal mark the successive pivot 
positions. It is assumed (and this need not be true) that the basis can be 
reduced to triangular form by pivoting successively on the lower right- 
hand element of each submatrix formed by deleting the preceding pivot row 
and column. Each pivot operation consists in using the assumed nonzero 
diagonal term to eliminate the column variable from all nonzero terms 
above the diagonal only. Hie symbol indicates the resulting position of 
zero coefficients above the diagonal. 



x * CK) 
xx* 

XXX 



B = 



x * 

XX* 



p s = 



rop) 



x 

X 
X 
X 

*" 

(enter) 



(2) 



Let T be fee resulting triangularized matrix; it has the form (3) . Note 
particularly that the pattern of nonzeros in T is precisely the same as the 
pattern of nonzeros on and below the main diagonal of the original basis B 
and that Pj, the transform of P s under the same row operations, may 
nonzeros in its leading components. 



* 
x * 

XX* 

XXX* 

X * 



XX 



fdrotf 



X 
X 
X 

X 

x 

X 

(enter) 



(3) 
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The sequence of operations on rows by which T is obtained from B is 
equivalent to multiplying B on the left by a succession of elementary 
matrices so that 



./. E m B 



(4) 



Here E m = E represents an elementary matrix corresponding to a pivot 
in row 6. Thus the first pivot operation is the same as multiplying B on 
the left by 



P56 
1 



(5) 



where p S6 is selected so that row 6, when multiplied by p^ and added to 
row 5, will cause the element (5,6) of the matrix to vanish. Since no elim- 
inations are required in column 5, E 5 is an identity matrix. Next, E 4 will 
be similar to E 6 except with one nonzero entry Ps 4 for element (3,4) and 
E 3 will have at most two nonzero entries above the diagonal p^, pjj, cor- 
responding to the factors required to eliminate elements (1,3) and (2,3) 
from the matrix using row 3. Similarly E 2 will have an entry p 12 , and Ej 
will be an identity matrix. Since each elementary matrix Ei is an identity 
matrix except for nonzero entries above the diagonal of column i, we may, 
for purposes of compact recording, simply list side by side the entries in 
column 1 of E lf in column 2 of E 2 , etc. We shall refer to this typical 
product form record of the transformations as the E -structure. For our 
example 



E- structure = 



P12 
1 



P23 

1 



(6) 



Note again that the pattern of nonzeros in the E -structure (excluding the 
units on the diagonal) is precisely the same as the pattern of nonzeros 



126 MATHEMATICAL PROGRAMMING 

above the main diagonal of the original basis B. Thus the statement in 
product form of the nonzero coefficients in the transformations Ej neces- 
sary to reduce a basis to triangular form T and the record of nonzeros in 
T have as compact a representation as the original basis. 

We give the formulas for the determination of the set of simplex multi- 
pliers (or pricing vector) ?r and the representation P s of the vector P s 
entering the basis, when Ej and T are known. Let y be the vector of co- 
efficients of the cost form for the basic variables, then by definition 

*B=y (7) 

If now we define ff* by the relation 

**T=y (8) 

then, it is easy to see, by (4), that 

T i*E t E 2 . . . E m (9) 

Because T is triangular, ir* can be directly computed from (8) and TT from 
(91 by applying to T* the transformations E lf E 2 , ... in turn on the right. 
Having obtained *, we can by the usual "pricing out" procedure deter- 
mine the vector P s to enter the basis by 

*P S = Min TPj < (10) 

By definition, the representation P s of P s in terms of the basis satisfies 
BP S = PS (11) 

If now we define P* by the relation 

P = EiE 2 ...E m P 8 (12) 

then, it is again easy to see, by (4) and (11), that 

Tl *s=P (13) 

Halation (12) allows us to compute Pj, and because T is triangular, P s 
is computed by direct solution of (13). 

Given P s and the basic feasible solution, the usual rules are next applied 
to determine the vector P r to drop from the basis and to determine the 
basic feasible solution for the next iteration. We shall omit these steps 
assuming they are known to the reader. 

Our problem now becomes one of "up-dating" our substitute inverse. 
Tfeis of course could be done by succession of pivot operations above the 
diagonal such as we described earlier. But this is not very efficient com- 
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putationally. We shall show instead an efficient procedure for easily modi- 
fying the E -structure and T matrix of one iteration to obtain those of the 

Let us assume in our example [see (2)] that the vector P s entering the 
basis, if entered in its proper position in the staircase array, would be 
located, say, between either vectors P 4 and P 5 of the basis {or vectors 
P and P 4 ), and let us suppose that vector P t is to be dropped. Starting 
with the columns of B and P s after they have been transformed by the row 
operations E 4 E 2 . . . E m , namely with T and P| as shown in <3>, our ob- 
jective is to triangularize the matrix formed by deleting the first column 
and introducing Pg say, between columns 4 and 5 (actually between col- 
umns 3 and 4 would be less work). The row operations that accomplish 
this are to create zeros in the first three rows of P| column by succes- 
sively adding first a multiple of row 2 to row 1, next a multiple of row 3 to 
row 2, and then a multiple of row 4 to row 3. We shall denote these single- 
row transformations by E\, E|, and Ej. For the present we have assumed 
above that the second, third and fourth components of P| are nonvanishing 
(this need not be the case). The results of these operations are shown in 
(14) where * indicates the elements of the previous diagonal and D those 
of the new diagonal. 



New T = 





* a 

x * 

X X 



(14) 



Drop 
column 



Enter new 
column here 



The relationship between the new T and the new B may be written 
(New T) = EjEi E| E t E 2 E 3 E 4 E 5 E s (New B) U5) 

If, however, the column to be dropped were j = 4 (instead of 5 = 1) , it 
would be necessary to eliminate the D element in column 3 and then &e 
ones in column 2 by additional transformations of the type Ej , say t,, 
E|, in this case 

(New T) =E^IlE|EiE|E 1 EjE 3 E 4 E 5 E 6 (New B) 
We have shown that the new T can be obtained by applying to the pre- 



130 



MATHEMATICAL PROGRAMMING 



vioua product of the EI a succession of row operations of the form EJ" 1 
where, in general, we have denoted by E^ an elementary matrix corre- 
sponding to adding a multiple of row i to row k. Our objective, however, 
ha* not been accomplished until we have shown how to obtain easily the 
new T directly from the new B by a succession of new pivot operations 
*. This is easy to accomplish if we observe the following rules: 

I. If E[ and E{ are two elementary matrices representing adding a 
multiple of row i to other rows, then their product EjEj can be replaced 
by an elementary matrix of the same type, say, E*. For example 



1 Pll 




l Pis 




1 Pl3 + Pl3 


1 P23 




1 P23 


= 


1 P23 + P23 


1_ 




1_ 




1 



i-i 



II. "Near commutativity" of adjacent-indexed matrices Ep and E^ 
holds; thus the product EJ" 1 Ej can be replaced by Ej^Ei. For example 

"1 



P|4 

1 J 



323 
1 



1 P34 
1 



in. Nonadjaeent -indexed matrices can be commuted; thus 
E|-<E k = E k Ef-> lfk<I-l 

For our example let us denote the new Ej by E*, so that we are inter- 
ested in obtaining the relation 

(New T) - E* E? E* Ej E* E (New B) 
by applying the above rules to (15). In this case 

E? * E t (the identity) 
E? =E|E 2 



particularly ibat the formation of each E*, from a computational point 
of view, consists esseatiaily of multiplying most of the elements of column 
I - I of Ej_| by a constant and adding it to the corresponding elements of 
column i of Ej. 

The process described above of reducing to triangular form the matrix 
formed by dropping a column of T and inserting P^ was based on the 
assumption that certain coefficients of Pg were nonzero. If, for example, 
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the second component is zero but the first component is not, it would not be 
possible to use a row operation E\ to cause the first component of P| to 
vanish. 

Let us suppose that the position of column P s in the new basis is k 
columns from the left (we assume that pivoting is done by starting always 
with the lower right-hand element of each submatrix). 

Let the column being dropped from the basis be r ^ k. In the process 
of computation of P| by (12), we obtain the vector 

P's =E k E k44 .-.E m P s 

Now Pg must have its first nonzero component for some index h ^ k 
since the new B is nonsingular. We assume for the moment that the k" 1 
component is not zero. Accordingly, starting with P|, the elimination of 
its first nonzero component with index s t can be effected by using its 
second nonzero component with index 82, etc., until its nonzero component 
with index s t = k is used. This corresponds to row operations of the form 

E Sl followed by E 2 , etc. The remaining components with indices 
s 2 s 3 

k, k+1, . . . , m are unaffected by the above operations and hence remain 
the same as those of P|, Thus the result is the same, as far as columns 
P^, P^, . . . , P m are concerned, as if triangularization had been effected 
directly using these columns. If Sj + 1 s 2 = A > 0, it will be necessary to 
permute cyclically certain of the rows by relabeling rows 
s lt B! + 1, . . . , s 2 - 1 as rows s 2 - 1, s it s 4 + 1, . . . , s 2 - 2. In a similar 
manner, rows s 2 , s 2 + 1, . . . , s 3 1 are permuted if s 2 + 1 s 3 > 0, etc. 
Allowing such permutations, it is no longer necessary to assume above 
that the k*k component of P| (or PS) was not zero. 

It is important to note that such permutations would have been required 
if direct triangularization of all columns had been effected initially. More- 
over, as far as staircase -structured systems are concerned, these permu- 
tations would not have affected the below -diagonal -staircase -form of T or 
the above -diagonal-staircase -form of the E -structure because, if direct 
eliminations were used, the eliminations and row interchanges would have 
been confined only to rows where the components of P s are nonzero. 

Let us now tumour attention to the column P r to be dropped from the 
basis. Suppose first r < k. Deletion of the corresponding column of T 
followed by the necessary eliminations to restore triangularity discussed 
earlier will also require permutations if the indicated pivot position along 
the diagonal has a zero coefficient. For example, a two-cycle permutation 
will be required in order to lower to the diagonal the aonzero coefficient 
just above the diagonal. If r ^ k, it appears to be necessary first to drop 
the column corresponding to P r from T and to retriangularize columns 
k, k+1, . . . , r-1 (omitting r), and next, to insert the column corresponding 
to P s by performing the eliminations described above to P. 

Since, in general, row permutations are required to obtain the triangular 
arrangement in standard form, it is necessary to replace (4) by 
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T = E t E 2 ... E m JB (4 T ) 

where J represents a permutation matrix. Each new cyclic permutation 
C introduced in the process of elimination to a new triangular form can be 
accounted lor by appropriately relabeling the row designations of coeffi- 
cients in E| and J. 

Finally, it is necessary to restate the rules given earlier for up-dating 
the substitute inverse when elementary matrices of the type Et, where 

I < i, appear on the left instead of EJ" 1 discussed earlier. In this case 
the rules are: 

If: E?E. = E-E. where i < i and where, letting p^ be the |,i compo- 
i f f I 

of E*, the I th column of E| is formed by multiplying by -p^ the 

corresponding coefficients in column I of E| with the exception of rows 1 
and I. For row l, the coefficient is pj, and for row i, the coefficient is 

unity, 

Hi : JtLjLrJUl ~~ Hi JtL 5, II I *^ ^ K 

IV: I|Eq = Eql| if I < q < i, where i is the same 1 as that which gen- 
erated Ef in If. Note that the commutativity of the matrices holds because 

If has a zero coefficient in column i for row q and Eq has a zero coeffi- 
cient in column q for row i. 



The Simp/ex Method Using Pseudo-basic Variables for 
Sfrucfurecf Linear Programming Problems 



E. M. L Beale 

A procedure is described for solving linear programming problems that 
consist of separate subproblems with a few linking variables that occur in 
all (or several) subproblems. This is the simplex method, organized so 
that the advantages of the special structure of the problem are preserved. 



1. INTRODUCTION 

Dantzig and Wolfe [2] have described a "decomposition principle" for 
solving linear programming problems consisting of a set of separate sub- 
problems except for a few "linking equations" containing variables that 
occur in all (or several) subproblems. This paper presents a method of 
solving the dual problem, i.e., one consisting of separate subproblems ex- 
cept for a few linking variables that occur in all (or several) subproblems. 

Problems of this sort arise in many contexts. For example, they arise 
when one is scheduling operations over several time-periods; and also in 
2 -stage linear programming under uncertainty when the random variables 
have discrete distributions. 

Of course, by the duality theorem, the linking variables problem and 
the linking equations problem can be transformed into one another. But, 
although it was inspired by Dantzig and Wolfe's decomposition principle, 
this work is in fact more closely related to Dantzig' s work on block 
triangularity fl]. Dantzig there proposed the use of the simplex method 
with an "artificial basis B" and a "true basis B." The true basis 
consists of the coefficients of the basic variables in the present trial 
solution, and the artificial basis differs from this in at most^a few 
columns and is "square block triangular." The inverse of J3 can then be 
stored compactly, and the work is carried out in terms of B" 1 and a 



tPart of this work was done at the symposium on Combinatorial Prob- 
lems sponsored by the RAND Corporation from July 10 to August IS, 1961. 
I am grateful for helpful comments from G- B. Dantzig, . J. Hoffman, 
W. Orchard-Hays and D. M. Smith. Any views expressed in this paper are 
those of the author. They should not be interpreted as reflecting the views 
of The RAND Corporation or the official opinion or policy of any of its gov- 
ernmental or private research sponsors. This paper appeared in print 
previously as RAND Corporation Paper P-2405, August 15, 1961. 
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matrix rj defining the columns of B not in B in terms of the columns of 
B. 

The present work proceeds similarly. But a more specialized problem 
has been considered, with the result that the lay-out becomes con- 
siderably neater. It seems likely that the proposed algorithm will be 
more efficient for these problems than that derived from the decompo- 
sition principle, since this algorithm follows the simplex method, and 
includes plausible rules for choosing the nonbasic variables to be intro- 
duced into the trial solution. 

The algorithm is motivated in Section 2, described algebraically in 
Section 3, and illustrated by an annotated small-scale numerical example 
in Section 4. Rules for choosing pivotal columns and rows are presented 
in Section 5. These rules do not affect the theoretical properties of the 
algorithm, but they may be vital to its practical efficiency. Finally, 
some general remarks about solving linear programming problems with 
special structure are offered in Section 6. 

For ease of exposition, the various stages of the algorithm are written 
out in explicit equation form in the first part of the paper, and the 
numerical example is given in the detached coefficient form. In practice 
there Is no difficulty about using the inverse matrix method for the sub- 
problems. But it may be best to store the coefficients of the linking 
variables in explicit tableau form, since they are used extensively and are 
subject to both row and column operations. (They correspond to Dantzig's 
essential columns of TJ, which he also visualized as being stored ex- 
plicitly.) fe this respect the algorithm is less compact than that derived 
from the decomposition principle when the linking coefficients are 
sparse since these linking coefficients can conveniently be manipulated 
through the inverse matrix. 



2. MOTIVATION 

The essential idea of this method is that the linking variables should 
be regarded as parameters. It is obvious that if these parameters are 
given specific numerical values, then it is a straightforward matter to 
solve the subproblems to optimize the objective function for these 
parameter values. It then "only" remains to see whether we can do even 
better by changing the parameters. If these were originally given arbi- 
trary values, then It is almost certain that the solution can be improved 
by either increasing or decreasing the value of one parameter, keeping 
the ethers constant. Because the problem is linear, this situation will 
persist until some basic variable becomes zero. If one were using the 
ordinary simplex method, one would then make this variable nonbasic, in 
place of the parameter. But we cannot do this without spoiling the struc- 
ture of t!*e problem, so we do not do It. Instead, we make a trans- 
formatioo of parameters, so that if we subsequently change one of the 
other parameters we do not change this zero-valued basic variable. I call 
thia a "pseudo-basic" variable, since it appears in the tableau as a basic 
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variable but is really a nonbasic variable from the point of view of the 
rationale of the simplex method. 

The resulting process can therefore be regarded as the simplex 
method, organized in such a way that some nonbasic variables remain 
on the left-hand sides of the equations so as to avoid spoiling the struc- 
ture of the problem. They correspond to variables in the artificial basis 
but not in the true basis in Dantzig's block triangular scheme. 



3. ALGEBRAIC DESCRIPTION OF THE ALGORITHM 

The problem to be solved can be represented as follows: Fiod non- 
negative vectors 9, x($, 1=1, , . . , L, so as to minimize 



i (1) 

subject to the constraints 
T <0) g = b tf> 



T + A x - b <=!, ...,L) 

We refer to the components of 9 as parameters. 

To start the algorithm, we express these constraints in "solved form, 
so that they read 



3=1 
p 

E tijl *) + S a iki x k 
5=1 * 



C -ct+cjflj + c kl x kl (2) 

5=1 * i 

where the variables on the left-hand sides are all distinct from the 
variables on the right-hand sides. The 0| and x^ may be artificial 
variables, in which case the objective function will contain an overriding 
term representing the sum of the infeasibilities. But this introduces no 
new principle or complication. When an artificial variable or parameter 
becomes nonbasic, it may of course be dropped from the problem. 

It is natural to start by giving the parameters some plausible values, 
and solving the subproblems for these parameter values. One can, of 
course, start with all the parameters equal to zero, in which case one has 
a genuine basic solution to the problem. But these will often be very un- 
realistic values, leaving a long way further to go to the optimum. And one 
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may have studied the subproblems separately, with the result that one has 
a fair idea about what these parameters should be. 
Let our plausible values be 

#j - Of j (J = 1, . . . , P) 

Then we define p new parameters <j by the equations 



write these p equations at the top of the tableau, and substitute for 0j 
throughout the remaining equations . 

Since our "plausible values" of <j are all zero, this means that the 
constant terms involved in solving the subproblems for these plausible 
parameter values are gathered together in the first column of the tableau. 
But these additional p equations play a central role in the algorithm, and 
we would have written them in even if our "plausible values" of ot\ had 
all been taken as zero. 

After having solved the subproblems for these parameter values, we 
will have a tableau of the following form: 



C =c fl +E c j*j + Z)W*k (3) 

ji k a 

where all cj^ 2: 0. The coefficients and variables occurring on the right- 
hand sides of (3) will of course not be numerically equal to those in (2), 
and the first group of equations (for 9$ will be p more in number than 
they were in (2). 

The next stage of the algorithm amounts to solving the linear pro- 
gramming problem defined by (3), with the restriction that the nonbasic 
variables xj^ must remain equal to zero. This is done by the simplex 
method usiiig the extended dual tableau. 

Ttie extended dual tableau contains the definitions of all variables, in- 
eluding the aoabasic ones, in terms of the nonbasic variables. It therefore 
includes a permutation matrix amongst its rows. But here the nonbasic 
variables must be regarded as new transformed parameters, and not 
kJaatified with the corresponding x|, since they are really equal to 

*! - a ikl Xfc| 

V 

There is also a complication in that this stage starts off from a nonbasic 
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solution. But this is met by first taking each of the original <: in turn, 
and either increasing or decreasing it (according to the sign of its co- 
efficient in the current expression for C) until some basic variable becomes 
equal to zero. We then make this variable pseudo-basic, introducing an 
associated parameter defined by the first p + 1 terms of the expression 
for this pseudo -basic variable into the nonbasic set. Once each #j has 
been processed in this way we always have p pseudo-basic variables 
(amongst the 0^ and x^), each associated with some parameter. There 
will then be the correct number of genuine basic variables, since p 
additional variables were added to the left-hand sides of the equations in 
the transformation from (2) to (3). 

Note that a new parameter is introduced into the nonbasic set at each 
iteration. Nevertheless, the tableau does not grow longer because these 
parameters are not sign -restricted, and they can be dropped as soon as 
they cease to be nonbasic. In fact, if one looks only at the tableau and not 
the names of the nonbasic variables, one is simply doing the simplex 
method in the extended dual tableau, as pointed out earlier. 

Eventually we will obtain a tableau like (3) with all Cj > 0. This implies 
that C cannot be further reduced by varying the values of the basic and 
pseudo-basic variables, keeping the genuine nonbasic variables, i.e., the 
xkjfcr equal to zero. Following the usual procedure in the simplex method, 
we must then consider whether it would be profitable to increase some 
nonbasic variable xj^, keeping the pseudo-basic variables and the other 
nonbasic variables equal to zero. 

So far, each operation has been a very simple one. We have been 
working exclusively with the first p + 1 columns of the tableau, the re- 
maining columns being simply copied from one tableau to the next except 
while we were solving the subprobleins for the initial plausible values of 
the parameters, when we only needed to work with the subproblems in- 
dividually. But now we have to pay a modest price for this simplification. 

When considering the genuine nonbasic variables, we find that the 
process of computing the effect of a unit increase in such a variable is 
somewhat more complicated than usual (unless one is using the product 
form of the inverse, in which case this is already a fairly complicated 
operation). Then, if any increase is necessary, up to 3 changes of 
variables may be required. One involves only the subproblem concerned, 
the second (which may not be needed) involves only the coefficients of 
one parameter, and the last involves only the constant terms. In a com- 
puter program these stages may be combined, but for ease of exposition 
they are here presented separately. 

To find the unit effect of increasing some nonbasic variable x^| we 
must find the coefficient c^ of this variable in the expression for C in 
terms of the pseudo-basic and nonbasic variables. This must be derived 
from the tableau represented by (3), where C is expressed in terms of the 
parameters and nonbasic variables, by substituting for the parameters in 
terms of the pseudo-basic and nonbasic variables. 

Now if the variable x^ is pseudo-basic, the expression for it in (3) will 
be of the form 
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a ikl x kl 



V 
so that the parameter <p^) can be represented as 



= x ii ~ a ikl x k 
l< 

and hence 



where summation ranges over all i for which x# is pseudo-basic, and 
is the parameter associated with it. 

If, and only if, cjt < can C be reduced by increasing xj^ and keeping 
all other nonbasic and pseudo -basic variables equal to zero. If there are 
several negative c|, the algorithm, like the original simplex method, 
does not depend for its theoretical properties on which one is chosen. 
Discussion of suitable selection rules is therefore deferred until 
Section 5. 

So let us suppose that we have found some nonbasic variable xj^ such 
that Cfci < 0, and that we wish to increase xfc|. Now it is just possible that 
there will be no pseudo-basic variable x^ in the same subproblem such 
that aifci *Q. Then we can perform a regular simplex step, i.e., pivotal 
operation, within the subproblem. This will increase x^ immediately, 
making it a genuine basic variable, and making some previously genuine 
basic variable nonbasic. 

But the usual situation will be to find some pseudo-basic variable x^ in 
this subproblem such that a^ *0. Then we cannot increase x^ directly 
without changing the value of xj|, which we do not want to do. We there- 
fore have to perform a preliminary pivotal operation within the subproblem 
to make xfc| pseudo-basic in place of some existing pseudo-basic 
variable x^| in this subproblem. If there are several such pseudo-basic 
variables, then again the algorithm theoretically does not depend on 
which one is chosen, and further discussion is deferred until Section 5. 

At this stage it is desirable to study the formulas in detail. We denote 
the nonbasic variable we ultimately wish to increase by x s . The pseudo- 
basic variable to be made nonbasic is denoted by x rj , and the other pseudo- 
baste variables In the same subproblem, if any, are denoted generically by 
*&. Let the parameters associated with the pseudo-basic variables x rjE 
a* xy be $( r ) and $$$ . Let x^ denote some other basic variable in the 
same subproblem, fy some other parameter, and xj^ some other non- 
basic variable in the same subproblem. 

Then tha tableau reads, in part, 

x ri 
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tyi^j + a isj pc a | + ay 
C = C Q + c r < r ) + Ch0h + c #j + c s&st + c kjRcl (7) 



And we have 

4 = c sl~ c r a rsi~E c h a hs* < <>> a rsl "^ ( 

h 

After the pivotal operation this part of the tableau reads 

+ P*rl - Pa rkXk | 



C =c + (c r ~Pc s |)<f>( r) +0^^) +GJ^J + Pcsjpri + Cfapcfei (9) 
where 



P = 

a hk = a hkl " Pa rk4 a hsl. 
a ik * a ik "" Pa rkl a isi* 
c k = c k 



Now, if there are any other pseudo-basic variables Xj^ in this sub- 
problem with a^ki * 0, we must "clean up" the coefficients of <fy r ) in the 
expressions for these variables. Otherwise when we change the value of 

^(r) to c * ian S e x sl we wi ^ also 
So we write 



and substitute throughout (9) for $(h) in terms of ^>j r j and <^(h). We can 
do this for all pseudo-basic variables X|jj in the subproblem in one 
operation. It affects only the column of coefficients of 4>/ r \- We add 
Pa hsl times tlle column of coefficients of $(h), and the tableau then 
reads 



x sl * " p *><r) * **& " pa rk x kl 

x ht ~ * <h) * P^bsl^rf * **k*Xfc* 

x u - b u + <tirf - Pa lS | ^S P^hfll 1 ^ *(r) * tttl ^ f (h) * *!)!*] + P*ial x rt 4 *ikl*k* 

h 

Pa^cjj) 4><r) * c h 4'<h) +c j*j + ^sl^rf + c k*kf 
* (ID 



Now at last we can increase Xg| by increasing $( r j if a rs | (and 
hence P) is negative, or by decreasing $( r ) if a rs | is positive, {Since 
the coefficient of $( r ) in the expression for C in (11) is simply "- 
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from (8>, this change must be profitable if we calculated c|| correctly in 
the first instance.] 

We now resume the process of optimizing the problem conditional on 
keeping all genuine nonbasic variables equal to zero. 

The algorithm is now complete, in that we have seen how to improve the 
trial solution by changing a pseudo-basic variable, or by changing a non- 
basic variable. If no such change is profitable then we know we have an 
optimal solution, just as we would if we had carried out our simplex 
calculations in the standard way. 

Of course it is not essential to find the very best solution with a given 
set of basic and pseudo-basic variables before examining the possibility 
of introducing some nonbasic variable. Indeed the following sort of scheme 
might work out best in practice. 

1. After solving the subproblems for the original "plausible values" for 
the parameters, eliminate each of the original <j in turn from the nonbasic 
set. We then have a genuine basic trial solution. 

2. If possible, introduce some nonbasic variable from the first sub- 
problem. 

3. If possible, introduce some pseudo -basic variable. 

4. If possible, introduce some nonbasic variable from the second sub- 
problem. 

And so on. 



4. A NUMERICAL EXAMPLE 

For the benefit of those who, like the present author, prefer numbers 
to formulas, I now present a minature scale numerical example. 
Minimize 

C = - 3i - 20 2 - 0j -r 2x<! + x sl + x sl + x 42 ^ X 52 +5 62 , 
subject to the constraints 0^ s 0, xy ^ 0, and 

x n = 2 - flj - 2fl 2 + 20 3 -r x 4l + x sl + x 61 



+ QS -x 42 + x 52 -f 2x 62 

2 4- x 42 - x 52 * x 62 

This problem consists of two 3x3 subproblems with three linking 
variables. In fact the tableau is barely sparse enough to justify using a 
special method, but the problem serves to illustrate the technique. 

Let us suppose that B i = 2 = ^3 = 1 are "plausible values" for the 
parameters. Then we start by writing the tableau in the form 
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X 21 
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1 


-1 


1 
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X 12 


3 


1 




-2 






-1 


1 




X 22 






-1 


1 






-1 


1 


2 


X 32 


1 


-1 


-3 








1 


-1 


1 


C 


-6 


-3 


-2 


-1 


2 1 


1 


1 


1 


5 



Note that all this tableau except the first column has in effect simply 
been copied from the previous tableau. Similar situations occur throughout 
the algorithm, but the tableau is nevertheless presented in full at each 
stage to make it easier to follow. 

The parameters $j have been given superfixes to distinguish them from 
their successors in the nonbasic set. For practical purposes these super- 
fixes are just decorations. 

The problem of solving the subproblems conditional on keeping the 
0i = is a standard exercise in linear programming. In the example this 
is already achieved. We therefore proceed to investigate changes in the 
parameters . 

At the moment we do not have a basic trial solution, since we have 8 
variables at nonzero levels (plus one accidental zero), and a basic solution 
should have only 6. So we proceed to increase or decrease each $4 in turn 
until we obtain a "pseudo -basic" variable with a trial value of zero on the 
right-hand side of some equation. We do this so as not to increase C at 
any stage. 

Since the coefficient of 0? in the expression for C is negative, we in- 
crease <$. Comparing coefficients of <j>i with the constant terms in the 
usual way, we see that this remains possible until $\ = 1, when both x lt 
and x 32 become zero. We arbitrarily select x tl from these to be the new 
"pseudo -basic" variable. We then introduce a new parameter $j, defined 
by the expression for x 41 without the genuine nonbasic variables, and 
substitute for throughout. We then have the tableau 

j Q 



Q 
m 4 


1 


-1 


-2 


2 








<3j 


2 


-1 


-2 


2 








e z 


1 




1 










e s 


1 






1 








* x n 




1 




1 


1 


1 




X 21 


2 


1 


3 


-3 1 


1 






X 31 


4 


-1 


1 


1 


1 


2 




x 12 


4 


-1 


-2 






-1 1 




X 22 






-1 


1 




-1 1 


2 


X 32 




1 


-1 


-2 




1 -1 


1 


C 


-9 


3 


4 


-7 2 


1 


111 


5 
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An asterisk has been placed against x n to indicate that it is a 
peeudo-basic variable. This tableau illustrates the purpose of the trans- 
formation of parameters. In the present trial solution x^ = 0, and we 
want to be sure thai it will not immediately become negative when we vary 
some other parameter. 

We now consider the parameter <?. This has a positive coefficient in 
the current expression for C, so we decrease it. We can do this until x 2 i 
becomes zero, and hence we introduce the new parameter $2 defined by 
the expression for x 21 without the genuine nonbasic variables, and 
substitute for $| throughout. We then have the tableau 
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X 52 X 62 
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-1/3 


1/3 
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10/3 


-1/3 


-2/3 












B l 


1/3 


-1/3 


1/3 


1 










Q 


1 






1 










**n 




1 




1 


1 


1 






* x *i 






1 


1 


1 








x n 


14/3 


-2/3 


-1/3 




1 


2 






X|j 


16/3 


-1/3 


-2/3 


-2 




-1 


1 




x n 


2/3 


1/3 


-1/3 






-1 


1 


2 


X,j 


2/3 


4/3 


-1/3 


-3 




1 


-1 


1 


C 


-35/3 


5/3 


4/3 


-3 2 


1 


1 1 


1 


5 



We now increase <p\ until x^ becomes zero, introduce the new param- 
eter 4>J defined by the expression for x 32 without the genuine nonbasic 
variables, and substitute for 0f throughout. We then have the tableau 



X 81 



X i2 X 62 



0J 


2/9 


4/9 


-1/9 


-1/3 










#1 


10/3 


-1/3 


-2/3 












$2 


5/9 


1/9 


2/9 


-1/3 










B l 


11/9 


4/9 


-1/9 


-1/3 










**U 




1 






1 1 


1 






* x fl 






1 




1 1 








x 


14/3 


-2/3 


-1/3 




1 


2 






x tt 


44/9 


-11/9 


-4/9 


2/3 




-1 


1 




*tt 


2/3 


1/3 


-1/3 






-1 


1 


2 


** 








1 




1 


-1 


1 


C 


-37/3 


1/3 


5/3 


1 


2 1 


1 1 


1 


5 



We now consider whether any nonbasic variable from the first sub- 
problem can usefully be increased. To illustrate the process of computing 
the cjg w express C completely in terms of the pseudo-basic and non- 
basic variables, though in practice only the coefficients of x 41 , x sl and x e i 
would be calculated at this point. We have 



THE SIMPLEX METHOD USING PSEUDO-BASIC VARIABLES 143 

+ 2x 41 + x 5i + x sl + x*2 + x 52 + 5x tt 



37 

c = --r 



X 6 l) 



+ 5., 



2x si 



So it is profitable to increase x 51 . But there are 2 pseudo-basic 
variables this subproblem, and we cannot make x 51 basic f < 
^therefore pivot between x 51 and X M) to make x 51 pseudo-bas 1C . This 
produces the following tableau. 



X S1 



,1 


10/3 


-1/3 


6j 


5/9 


1/9 


<?3 


11/9 


4/9 


Xll 




1 


X 5 1 


14/3 


-2/3 


x 


44/9 


-11/9 


Xj 2 

Cw 


2/3 


1/3 


*Xs2 
C 


-37/3 


1/3 


e now put 




,2 _ .1 _ A l 
^1-01 92 





-2/3 
2/9 

-1/9 
-1 

-4/3 
-4/9 
-1/3 

2/3 



-1/3 
-1/3 



2/3 





1 1 




-1 


1 




-1 


1 2 






-1 1 


2 




1 -1 


1 


1 


1111 


5 



This affets only the column of coefficients of 
tableau 



and we have the 






10/3 
5/9 


-1/3 
1/9 


-1 
1/3 


-1/3 


4 


11/9 


4/9 


1/3 


-1/3 


*Xli 




1 






*X 5 l 






-1 




9JL 

X 12 


14/3 
44/9 


-2/3 
-11/9 


-2 
-5/3 


2/3 


X22 


2/3 


1/3 




l 


C 


-37/3 


1/3 


1 


1 



1 
1 
1 


1 
2 








-1 


1 






-1 


1 


2 




1 


-1 


1 


1 


1 1 


1 


5 
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We are at last in a position to increase x 51 , by decreasing < 2 . We can 
continue to do this until B l vanishes. Since the new pseudo-basic variable 
IB a parameter, and not a variable in a subproblem, we could make it the 
new parameter. But the notation is more uniform if we can give it a new 
name $\. We then have the tableau 



X 61 



X 52 



*2 


-5/3 


~l/3 


3 


1 




8 1 


5 




-3 


~1 




*#2 






1 






#3 


2/3 


1/3 


1 






* x li 




1 






1 1 


X M 


5/3 


1/3 


-3 


-1/3 -1 


1 


X 3i 


8 




-6 


-2 -1 


1 2 


X 12 


23/3 


-2/3 


-5 


"-1 


-1 1 




2/3 


1/3 






-112 


*X*2 








1 


1-11 


C 


-14 




3 


2 1 


11115 



Since the coefficients of the parameters in the expression for C are all 
nonnegative, we cannot usefully increase any pseudo-basic variable. We 
therefore look at the nonbasic variables in the second subproblem. We 
find that their coefficients in the expression for C in terms of the pseudo- 
basic and nonbasic variables are given by 



~ x $2 



- 2 



So we can profitably increase x 42 . But we must first make x 42 pseudo- 
basic in place of x^. This produces the following tableau. 

01 02 03 X 41 X 21 X 61 X 32 X 52 X 62 



*1 


5 




-3 


-1 










*#2 






1 












#3 


2/3 


1/3 


1 












**11 




1 






1 


1 






X 51 


5/3 


1/3 


-3 


-1 


-1 1 








X 3i 


8 




-6 


-2 


-1 1 


2 






x t2 


23/3 


-2/3 


-5 








1 


1 


X 22 


2/3 


1/3 




1 






-1 


3 


*X|2 








-1 






1 


1 -1 


C 


-14 




3 


1 


1 1 


1 


1 


2 4 



Since there is no other pseudo -basic variable in the second sub- 
problem, we can immediately proceed to increase x^ by decreasing 
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d, 1 . We can do this until x 22 becomes equal to zero. Then we introduce 
the new parameter $3, and have the tableau 

I 2 J.2 j2 -V _ _ V_. V^., YayK Yr* )C* 



*1 


-2/3 


-1/3 




1 




<9 t 


17/3 


1/3 


-3 


-1 




*0 2 






1 









2/3 


1/3 


1 






*x 




1 






1 1 


X 51 


7/3 


2/3 


-3 


-1 


-1 1 




28/3 


2/3 


-6 


-2 


-112 


X 12 


23/3 


-2/3 


-5 




1 1 


*X 22 








1 


1 3 




2/3 


1/3 




-1 


1 1 -1 


C 


-44/3 


-1/3 


3 


1 


111124 



We now find that we can profitably increase <frj, and hence the pseudo- 
basic variable x u . We can do this until x 12 becomes equal to zero. So 
we introduce the new parameter $f, and write 





*! 


<*>! 


03 X 41 X 21 X 61 X 32 X 52 ^2 


,' 


23/3 
19/2 


-3/2 
-1/2 


15/2 
-11/2 
1 


-1 








03 


9/2 


-1/2 


-3/2 












23/2 


-3/2 


-15/2 




1 


1 




X 51 


10 


-1 


-8 


-1 


-1 1 








17 


-1 


-11 


-2 


-1 1 


2 




*X 12 




1 




1 




1 
-1 


1 
3 


*42 
C 


9/2 
-37/2 


-1/2 
1/2 


-5/2 
11/2 


-1 
1 


1 1 


1 
1 1 


1 -1 

2 4 



We now find that no pseudo-basic or nonbasic variable can profitably 
be increased, to fact we have 



T 



^ X 61 * 



11 

-2-* 



x 22 



3x 62 ) 



2 2 2 

So we have the optimum solution, given by 



+ x 21 - x 6i - -x 32 - 2x 52 ~ - 
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*l s p ***<>* *!'! 

x u - y, x 21 - 0, x 31 = 17, x 41 = 0, X 51 = 10, x = 

9 
x 12 = 0, x 22 - 0, Xtf = 0, x 42 = -, x S2 = 0, x 62 = 



5. RULES FOR CHOOSING PIVOTAL COLUMNS AND ROWS 

A fair amount of attention has recently been given to the problem of 
choosing the pivotal column in the simplex method. This is particularly 
important in large problems, to which the algorithm presented here may 
hav to be applied. ft is therefore of interest to consider whether any 
effective special rule for this can be devised, based on the special struc- 
ture of the problem. It turns out that there is one important special case 
where there is aa obviously best choice of nonbasic variable to be in- 
creased. And this special case suggests a plausible rule that can be 
applied generally. 

We consider the situation where 

(1) there is only one pseudo-basic variable x rj g in the subproblem 
concerned, 

(2) the coefficient C( r ) of the associated parameter $( r ) in the ex- 
pression for C is nonnegative, so x r | itself cannot usefully be increased, 
and 

(3) all the cj^ are nonnegative, so that we have an optimum tableau for 
the subproblem given the parameter values. 

Now it is likely that the next new pseudo-basic variable will not be in 
this subproblem, in which case the c& for this subproblem will simply 
be the corresponding Cfc. So we want to keep the cj^ all nonnegative when 
we make x r nonbasic. This means that we should use the dual simplex 
method to choose the pivotal column, i.e., we should choose the nonbasic 
variable Xgj such that a rs | > 0, and 

c al m min 
* *rkl > 



We now show that in this case the rule is equivalent to the following 
rule: 

Choose the nonbasic variable Xg| such that c|| < D, and 

c sl r max /W\ 

c sl ~ c ki < \P&/ ( 13 > 
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To prove this, let a k = Cki/a^. Then (12) says thai we should pick 
the smallest positive a^. But (13) tells us to maximize 



c kl 



c kl c kf ~ c (r) a rkl * 

from amongst those columns for which c| < 0, i.e., for which 
< o k < C( r ). The maximum (i.e., least negative) value of (13) is there- 
fore obtained by taking the smallest positive a^. So (13) always picks the 
same column as (12), if any such nonbasic variable can profitably be in- 
creased at all. 

Formula (13) is advocated for general use on the following grounds: 

(1) ft is easy to apply, even using the product form of inverse for the 
subproblems, since it involves only pricing out one row, the cj^ row, in 
addition to the cfcf which are essential in any case. 

(2) It gives the best answer in the only situation where there is an ob- 
vious best answer. 

(3) It is nondimensional. 

(4) ft has the advantage over other similar rules that it will give 
precedence to a column in which c^ and cfa are both negative. This 
seems desirable because a column with a negative c^| is unsatisfactory 
in any circumstances. Insignificantly negative values of e| must of 
course be rejected. This can perhaps best be achieved by modifying (13) 
to read 



/ \ 

s _ max / c kl | 



for some small positive . The rule is then no longer strictly non- 
dimensional; but the also serves the useful purpose of selecting large 
negative values of c^ when there are no pseudo-basic variables in the 
subproblem, in which case c^ = c| for all k. 

Having chosen the pivotal column, we must have a rule for choosing the 
pivotal row when the subproblem contains more than one pseudo -basic 
variable. Our special case unfortunately throws no light on this problem, 
since there is then only one possible pivotal row. But the following pro- 
cedure seems sensible: 

Choose the pivotal row, i.e., pseudo basic variable x r | to be made non- 
basic, from among the pseudo-basic variables in the subproblem so as to 
maximize c^ r ) | a rs | |, where <{ r ) is the parameter associated with x r |, 
and a rs | is chosen to have the same sign as c s | if possible. 

The merits of this rule are: 

(1) ft is easy to apply. 

(2) It is nondimensional. 

(3) ft favors large pivots (i.e., large values of |a rs |j), and also large 
values of C{ r ) the latter implying that the pseudo-basic variable being 
made nonbasic was a very unprofitable one to introduce at a positive level. 
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(4) In the new expression for C in terms of the parameters and nonbasic 
variables, the coefficient of the new nonbasic variable given by (11) as 
Cg|/a rs | is made positive if possible. (This is always possible if 
e s j > 0. If c s | < it might not be.) 



6. THE IMPORTANCE OF SPECIAL STRUCTURE 

Many people, and in particular George Dantzig, have stressed the 
importance of developing special methods for exploiting special matrix 
structure in linear programming problems. In Ref. 1, Dantzig suggests 
that the general simplex method may not be practical for systems con- 
taining many more than 100 equations. We are now talking about solving 
systems about ten times this size, but this only increases the importance 
of special methods, since really large problems are almost bound to 
have structure that can be taken advantage of. 

ft seems likely that these special methods will have to be based on 
the general philosophy first illustrated in the revised simplex method that 
one should work with a compact formulation of the problem containing 
enough information to enable one to compute fairly easily, the quantities 
one needs, ratherthan carry around all the quantities that one might con- 
ceivably need in a more or less explicit form. I hope I have succeeded in 
presenting the present algorithm as a natural and straightforward applica- 
tion of this philosophy. 
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Dual and Paramefric Methods in Decomposition 



Jean M. Abo die 
A. C Williams 

1. INTRODUCTION 

The decomposition algorithm of Dantzig and Wolfe for the treatment of 
large linear programs [1], [2], may be briefly described as follows: (i) 
the number of rows in a linear program is reduced at the expense of intro- 
ducing (in general) a very large number of unknowns; then, (ii) the simplex 
algorithm is modified by the introduction of a "generalized pricing opera- 
tion" so as to render the new problem (the "extremal problem '1 amenable 
to practical solution in spite of the large number of unknowns. The algo- 
rithm is basically a primal method. We show here that by introducing a 
different vector selection method, we are able to formulate an algorithm 
which is still a decomposition algorithm, but which is basically a dual 
method. In addition, this vector selection method also allows certain 
parametric linear programs to be solved by decomposition. 

These techniques can easily be incorporated into any general decompo- 
sition computer code, thus making possible important post -optimal para- 
metric studies, as well as allowing flexibility of choice in the method of 
solution of the nonparametric problem. In this latter connection, we re- 
mark that dual feasible solutions are sometimes more easily come by than 
are primal feasible solutions. 

2. DUAL DECOMPOSITION FORMULATION 

We consider the linear programming problem 

Ax=a (l.a) 

Bx = b (l.b) 

x ^ (l.c) 

min fx (l.d) 

where A is an m t x n matrix, B is an m 2 x n matrix, a and b are re- 
spectively m l and m 2 dimensional column vectors, f is an n dimensional 
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row vector, and x is the n dimensional column vector of unknowns. 

In order to treat (1) by decomposition, we let *, . . . , p be all the 
basic (or extreme) solutions to Bx = b, x ^ 0, and we let rj 1 , . . . , 77^ be a 
complete set of generators for the solutions to Bx = 0, x ^ 0. If we now in- 
troduce the uifcnowns X p , p = 1, . . - , P; Mq> 3 = 1> Q then the linear 
program of (1) is equivalent to 

n =a (2. a) 



S X p = 1 (2.b) 

P=l 

X p 2t 0, M q * (2.c) 

p $ 

min 2 <Q P > *p + S WN < 2 -* 

p=l q=4 

in the sense that if (Xp^) is optimal for (2), then 



(2 - e) 
p=l <pt 

is optimal for (1), and further; if x is optimal for (1), then for some 
(AJUiq), optimal for (2), x can be written in the form (2.e). The replace- 
ment of the linear program (1) by the linear program (2) is thus step (i) of 
the Dantzig-Wolfe method. Instead of step (ii), we intend to accomplish the 
same result by modifying Lemke's dual simplex method [3J. 

In connection with a remark made in the introduction, we may observe 
that if z is the minimum value for fx subject to Bx = b, x 2= 0, then the 
vector (0, . . . , 0, z*> is immediately a dual feasible solution for (2). Note, 
however, that the presence of a zero column in B could easily cause fx to 
fail to have such a minimum. Of course, in any case, if fx has no minimum 
on Bx * b, x 2: 0, some kind of "Phase I" must be used, 

In order to solve the linear program (2) by the dual simplex method, we 
assume that we have, at each iteration: (i) a basic solution (A.,ju), i.e. a 
basic solution to the constraints (2 .a) and (2.b), [but which may not satisfy 
(2.c)}, such that the corresponding dual solution Or,p) = fr 1 , , . . , ?r m , p) is 
feasible, and (ii) an inverse matrix, i.e. the inverse of the matrix whose 
columns are the columns corresponding to the various A. p ,Mq of the given 
basic solution. This matrix has dimension {m t + 1) x (n^ +1). Let the i tn 
row of the inverse be denoted by (u it Wi ) (u 1 ., . . . , uP 1 , Wi ). 

Let us review briefly and without proof the steps required for linear 
programming with the dual simplex method. Let it be required to solve 

Ax * a, x ss 0, min fx 
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As above, we assume at each iteration, (i) that we have a basic solution to 
Ax = a, (but x ^ may not be satisfied) such that the corresponding dual 
solution -K is feasible, i.e. *A ^ f; and (ii) that we have an inverse matrix 
with rows denoted by U[. The current basic solution x is then given by 
a = Ua. Now, if every component of the basic solution x = a is nonnegative 
(i.e. the current primal solution is feasible) then the current primal solu- 
tion is optimal. But if there is some component, say x r * a r , which is 
negative, then the transform of the row r may be used as a "pivot row." 
The selection of the column to enter the basis is as follows. Let the col- 
umns of A be written a 1 , . . . , a m . Call a column ai admissible if 
Upa-i < 0. If there are no admissible columns, there is no solution to the 
constraints Ax = a, x ^ 0. Suppose then, that the admissible set of columns 
is not empty. Then an admissible column aJ, for which the ratio 



UyaJ (3) 

is a minimum over all admissible columns, is selected for the basis. This 
vector is then introduced into the basis; a new inverse, a new primal, and 
a new dual solution are computed. The next iteration then commences. 

Turning now to the linear program of the form given by (2), we see that 
difficulties will arise in the selection of the admissible vector for which 
the ratio (3) is a minimum over all admissible vectors. This is so because 
in the decomposition method the columns of the linear program (2) are not 
explicit, and further, they are in general so very numerous as to make 
their explicit calculation out of the question. Now the decomposition 
method for linear programming is a modification of the simplex method 
whereby linear programs of the type (2) may be solved without having all 
the columns calculated explicitly. In fact, in the simplex method the only 
difficulty encountered by not having explicit columns is that of selecting the 
vector to enter the basis. The Dantzig-Wolfe decomposition method, then, 
gives an algorithm in terms of a linear "subproblem" whereby the selec- 
tion can be made. This process may be described as a generalized pricing 
operation. 

The situation here is similar. The only difficulty in the present dual de- 
composition algorithm is encountered in the selection of a vector for the 
basis, and we overcome this difficulty by developing a selection algorithm 
which does not require the vectors to be explicit. 

Let us see what has to be done. Suppose the pivot row r has been se- 
lected. Then the ratio (3) for columns of the type A^P is given by 



u r A$P + w r (4) 

and for columns of the type Ayfl that ratio is given by 
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The problem which we are considering is thus reduced to the problem of 
finding that {P or ifi from among the , 77 for which the denominators of 
(4) and (5) are negative and for which the ratio (4) or (5) is a minimum over 
such {, T|. We note, also that the numerators of these ratios are nonpositive, 
since (T F p) is feasible. 

The determination of the correct vector or 77 according to the above 
criterion is the "subproblem" which has to be solved at each iteration 
using the current values of the dual variables in (4) and (5). In Section 4 
of this paper we give an algorithm, in terms of linear programming, for 
the solution of this nonlinear program. Of course, once this generalized 
vector selection step has been taken, the calculation continues exactly as 
in the dual simplex method. 

Before spelling out the selection algorithm, however, we shall show 
that certain parametric linear programming problems also reduce to this 
case. 



3. PABAMETRIC LINEAR PROGRAMMING 

We consider two parametric linear programming (PLP) problems to be 
solved by fee decomposition method. 

The first of these is that of finding the optimal solution x(0) as a func- 
tion of the parameter & for the parametric linear program 

Ax = a + 0a, Bx = b, x > 0, min fx 

Again, we consider the linear program (2), where in (2.a) we replace a by 
a + 01. 

We assume that an optimal basic solution (xV) for = (initially 
= 0) has been found, and that as a by-product of the calculation we have 
also found an optimal dual solution fr,p) and the basis inverse. We now 
wish to compute an optimal solution for all > j? for which such solutions 
exist. 

The steps required for this type of PLP problem with the simplex 
method will now be reviewed. Suppose we have a basic optimal solution 
for 



Ax a + #a, x 0, min fx 

for some # - 0. Let the rows of the inverse U of the basis be denoted by 
uj. Let * be the optimal dual solution and let a = Ua, and a = Ua. Now if 
every element of the vector a is nonnegative, we have that x(0) = a + Bo. 
for all a . But if there are some elements of a which are negative, we 
define 



~ 
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Then x(6) - a + Ba. for ^ B ^0. In this case a change of basis is re- 
quired in order to compute x(0) for & > 0. Row r is chosen as the pivot 
row, and as before we consider the admissible columns ai, i.e., those 
columns for which u^J < 0. If there are no admissible columns, there 
are no solutions to the constraints for 6 > Q. If there are any admissible 
columns, then we choose that admissible column which minimizes the ratio 
given by (3). 

Clearly, then, the parametric linear program problem by decomposition, 
i.e., for a program of the type (2), is similarly reduced to the problem of 
selecting a vector for which the ratio (4 or (5) is a minimum, subject to 
the constraint that the denominator be negative. The algorithm of Section 
4 is again applicable. 

The second parametric linear programming problem is 

Ax = a, Bx = b, x 0, min (f + 0f)x 

which we reformulate again in the form (2) . We assume that an optimal 
solution (X4x) is available for 0=0, also an optimal dual solution Or,p) 
and the basis inverse. Let (,p) be the U -transform of the vector whose 
elements are the elements of (fjPji^) corresponding to the basis elements 
(A.p4tq). Now the vector to enter the basis [so as to compute x e (0) for > OJ 
is that vector for which 

QrA-f) g +p (irA - f )q 

(f A - f) { + p <*A - f)i) 

is a minimum over the set of all such vectors for which the denominator 
is negative. (If the denominator is nonnegative for all $P and all rft, then 
the current solution is optimal for all B ^ .) The problem is again re- 
duced to the previous ones. 



4. THE GENERALIZED VECTOR SELECTION ALGORITHM 

In the preceding sections we reformulated linear programming problems 
by decomposition, and considered the task of solving the resulting extremal 
linear program by the dual method, or of finding optimal solutions as a 
function of a linear parameter either in the cost row or in the inhomoge- 
neous part of the extremal constraints. We showed that each of these 
problems is reduced to a succession of problems of the following type. 

Let I 1 , 2 , . . . , p be all the basic solutions to Bx = b, x s 0. Let 
Tj 1 , . . . , 77^ be a complete set of generators for the solutions to Bx = 0, 
x ^ 0. Assume that the solution set is not empty. Let c and d be given 
vectors (they stand respectively for A - f and u r A of 4 and 5 above), and 
let p and w be given numbers. Assume that for each basic solution 
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and that for each generator 
cifi ^ q = 1, . . . , Q 

Define an admissible basic solution as a basic solution which satisfies 
d * w < and an admissible generator as a generator which satisfies 
drj < 0. Define the admissible set S as the union of the set of admissible 
basic solutions and the set of admissible generators. Call the members of 
the admissible set the admissible vectors. On the admissible set we define 
the real valued function 



w 



For any v S we may call v(v) the value of v. Note that v(v) > for 
all v S. 

Problem 

It is required to find an admissible vector for which the value is a min- 
imum over the set of admissible vectors, or else to determine that the ad- 
missible set is empty. 

We now give an algorithm in terms of a succession of linear programs 
for the solution of this problem. The algorithm is independent of the 
method used for the solution of the individual linear programs, subject 
only to the following condition. Whatever method is used must find an op- 
timal basic solution (in case the optimal is not unique), or in the case of 
the objective function not bounded, a solution to Bx = 0, x > from some 
finite set of generators. The simplex method has these properties, but the 
decomposition method does not have the first property. These assumptions 
may obviously be relaxed for specific algorithms. 

When the simplex method is used, however, certain modifications of the 
algorithm are possible. These modifications allow the calculations to be 
et out la a "tableau" format and, in addition, appear to reduce the amount 
ol calculation required. 

Very briefly, the general algorithm consists of constructing a sequence 
of admissible vectors such that the sequence {^(vt)} of their values is 
monotone strictly decreasing. Since the v t are drawn from a finite set, 
termination is thereby assured. We then show that the terminal vector is 
the wctor sought. If for some t the value ^v 1 ) = is attained, the calcu- 
lation is terminated forthwith, since there can be no admissible vector with 
value less then zero, 

Tbe Algorithm 

(The numbers in parentheses refer to proofs which appear directly fol- 
lowing the description of the algorithm.) 
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Step 1. Consider the linear program Bx = b, x 0, min dx + w. 

Case 1. We find an optimal basic solution . Then every generator 
satisfies drj ^ 0, i.e. there is no admissible generator. 

Subcase A. For an optimal basic solution, we have d + w == 0. 

Then there is no admissible basic solution, i.e. the admissible set is 

empty. 

Subcase B. For an optimal basic solution, we have d + w < 0. 

Then take v 1 = , and put 



. t . 

dv 1 + w 



If i/j = 0, v 1 is optimal. If v { > 0, go to step 3. 

Case 2. We find that there is no vector which minimizes dx + w, i.e., 
we find a vector which satisfies 17 ^ 0, BTJ - 0, ch) < 0. Set v 1 = rj and put 



If vi = 0, v 1 is optimal. If i/ t > 0, go to Step 2. 
Step 2. Consider the linear program Bx = b, x ^ 0, min v^(dx + w) ~ (ex + p). 

Case 1. We find a vector 7? ^ such that BTJ = 0, vtdrj ~ CTJ < 0. Then TJ 
is admissible (1). Set v u = 77 and define 



Then ptfi < "t ( 2 )- If ^t^i = vt * 4 is optimal. If j>t+i > return to step 2. 

Case 2. We find an optimal basic solution 5. Then the value of the ad- 
missible generator v t (the last admissible vector computed) has a value 
which is minimum over the set of admissible generators (3). 
Subcase A. For | an optimal basic solution, 

j/ t (d| + w) ~ (c| + p) 5: 

is satisfied. Then the admissible vector v 4 is optimal (4) . 
Subcase B. For an optimal basic solution, 

* t {d| + w) - (eg + p) < 

is satisfied. Then 4 is an admissible basic solution (5). Set 
v t*i = ^ anc j ^t 
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Then UM < vt (6). If pt+i = 0, v Ui is optimal. If z/t+i > 0, go to Step 3. 

Step 3. Consider the linear program Bx = b, x ^ 0, min ^ t (dx + w) 

- (ex"-** p) . 

This problem now always has a minimal vector (7). Let be an optimal 

basic solution. Then, since v 1 (the last admissible vector calculated) is 

feasible, we have 

i*t(d{ + w) - (c + p) < i; t (dvt + w) ~ (cv* + p) 

But by definition of v t the right-hand side of this inequality is zero. 

Case 1. v t (d + w) - (c| + p) < 0. Then 5 is an admissible basic solu- 
tion (5). Set vt+i = 4 and put 



+ w 



Then v t+1 < vt (6). If n + i = 0, v t<fl is optimal. If ^ +1 > 0, return to Step 3. 



2. PtW| + w) - (c^ -^p) = 0. Then the admissible basic solution 
is optimal (8). 

<1) Since CTJ ^ for every generator and since y t > 0> the result drj < 

follows immediately. 
(2) From ptdq - CTJ < and drj < we have 



CTJ _ 

= 



(3) Since Bx - b, x > 0, min i/ t (dx + w) - (ex + p) has a minimum solution, 
we have that every generator 77 satisfies 



~ CJJ ^ 

Therefore every admissible generator 77' satisfies 

*$- *n 

(4) Every basic solution { satisfies 



since that condition is satisfied by an optimal basic solution. There 
fore, for every admissible basic solution | r , we have 
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Thus the value v^ of the admissible vector v* is minimal over the set 
of admissible basic vectors as well as over the set of admissible 
generators. 

(5) Since c + p ^ for all basic solutions and since y t > 0, the result 
d| + w < follows immediately. 

(6) From i/t(d + w) - (c| + p) < and d{ + w < we have 

p 

r = IJ+, , 



d + w dvt+i + w 

(7) Suppose there were no minimum vector for this linear program. Then 
there is a generator 77 such that 

vfln - crj < (7) 

But then dj] < (I/v$ 07 =s 0, i.e., such a generator is admissible. 

Now if Step 3 was entered from Step 1, Case 1, Subcase B there can 
be no admissible generators, so there is a contradiction. 

If step 3 was entered from Step 2, Case 2, Subcase B, or from 3, then 
we know that Bx = b, x ^ 0, min yt-i (dx +w)-(cs +w) (where i^-i > *t) 
has a minimum vector i.e., every generator must satisfy 



~ CT ? ~ 
But (7), d?7 < 0, vt < vt-i yield 



which is again a contradiction. 

(8) Precisely the argument of (3) shows that ^ is minimal over the set of 
admissible generators. Then precisely the argument of (4) shows that 
v^ is optimal. 

Discussion 

The problem of optimizing a given function subject to constraints is 
often solved in two parts. In the first part it is determined whether or not 
there are any solutions to the constraints, and if there are, such a solution 
is produced. In the second part, then, new solutions are successively cal- 
culated, each with a more nearly optimal value than the preceding one. In 
the above algorithm Step 1 is such a first part its purpose being to deter- 
mine whether or not there are any admissible vectors and, if there are, 
to produce one. Steps 2 and 3 are then the second part operation, in that 
at each step a new admissible vector with value less than the preceding one 
is calculated. 

The algorithm requires that on each step either we optimize or we 
produce the generator which shows that there is no minimum. Actually, 
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however, once any admissible vector has been found (not necessarily a 
minimum as required by Step 1), the operations of Steps 2 and 3 may be 
commenced immediately, and once these operations are commenced there 
is no need to run the indicated linear programs to optimal in fact, if the 
simplex method is used, it is the usual case that only a single simplex 
iteration need be done to reduce the value of i/. 

The underlying principle is this. At any point in the algorithm we may 
replace the current admissible vector by an admissible vector whose value 
is not greater, provided that at the same time the current value i/ t is also 
replaced by the value of the new admissible vector. We note in this connec- 
tion that in Step 2 the monotonicity of {j/ t } does not depend on what the pre- 
vious admissible vector was, but depends only on the number v in the cur- 
rent objective function. In Step 3, there can be no admissible generator with 
value not greater than that of the current admissible vector. Thus the re- 
placement can only be by an admissible basic solution. But here the mono- 
tonicity of {*>} depends only on the feasibility of the previous admissible 
vector, and the number v^ 

Therefore, when the simplex method is used, the algorithm may be 
modified as follows. After each simplex iteration if we obtain a new ad- 
missible basic vector with value not greater than the previous value (as we 
must in Step 3, and as we may in Step 2), the new vector and the new value 
are used immediately in the next simplex iteration. Clearly, any method 
for resolving degeneracy for the simplex method can be used here. 
Finally, we remark that the necessary calculations are conveniently done 
by forming rows for c and d, adjoining them to the B matrix, and carrying 
out transformations on them along with the other rows of B. 
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Convex Parfifion Programming 



J. B. Rosen 

1. INTRODUCTION 

There are a number of important types of mathematical programming 
problems which lead naturally to a block diagonal structure for the con- 
straint coefficient matrix. One kind of problem which may have this 
structure is a multiple plant or refinery model where each plant or re- 
finery is represented by a different block, the blocks being coupled by raw 
material allocation and product distribution. Another type of problem lead- 
ing to a block diagonal structure is a dynamic model with storage, where 
each block represents a single time period with storage between successive 
time periods as the coupling between blocks. A large block diagonal struc- 
ture may also arise from a stochastic programming problem where the 
constraint right-hand side vector is specified as a random vector selected 
from a finite set with known probabilities [4j. 

In all of these cases the complete problem can be represented as a 
number of smaller problems tied together by coupling equations, coupling 
variables or both. The first proposal specifically taking advantage of this 
structure for linear problems was the decomposition principle of Dantzig 
and Wolfe [3j. 

The partition programming method for the solution of convex problems 
with a block diagonal structure [8] is applicable to the nonlinear problem 
when the blocks are coupled by a set of coupling variables, denoted by an 
s-dimensional vector y. This structure is shown in Fig. 1, where a prob- 
lem with t blocks is illustrated. A formal statement of the complete prob- 
lem is given by (2.1) in the next section. In this paper we will consider the 
multiblock problem with the structure shown in Fig. 1, which we will call 
the dual form. The right-hand side vector for each block, bjty), 
i = 1, , . . , t is assumed to be a convex vector function of the vector y. 
For the special case where the b<y) are linear in y we have a com- 
pletely linear multiblock problem. For this linear case the corresponding 
primal problem is a multiblock problem with coupling constraints in the 
form normally considered for the decomposition algorithm. The solution 
of such linear multiblock problems by partition programming in both the 
primal and dual form has previously been presented [91 and will be de- 
scribed in a separate paper. A similar approach for the linear problem in 
the dual form has been developed independently by Beale [2] . 
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m 2 



Fig. 1. Complete problem, -blocks. 

The partition programming algorithm is based largely on the fact that 
for any fixed value of y, the complete problem (2.1) reduces to a set of 
I relatively small linear subproblems, each of which can be solved 
independently of the others. These I subproblems (called Problem I) are 



min fc"[ x { \ A[ Xj 



Furthermore, it is shown by Theorem 1 that 



(1.1) 






is=1 



is a convex function of y. A global minimum of the complete problem (2.1) 
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is therefore given by the minimum of the convex function *<y). For a spe- 
cific feasible value y = y , the optimal solution of each of the linear sub- 
problems (1.1) gives a vector xi , the value ^(yo), and a nonsingular 
basis A[ and corresponding vector b|(y) such that A.]*xi - bi<y fi ) and 
A7* Ci 2- 0. We can now reduce the complete problem to one "in the y-space 
only by (temporarily) requiring that each vector xj be given by 
AjXi = b j(y), i = 1, ...,*. We can then explicitly formulate an 
s-dimensional convex problem in the y-space, which we call Problem II. 
problem II has a convex objective function and linear constraints and 
can be solved by the gradient projection method for which an efficient 
computer program is available [10]. 

It is shown by Theorem 2 that we can recognize the Kuhn-Tucker 
conditions for the complete problem optimum in terms of the optimum 
Problem I and H solutions. If *(y ) is not a minimum, then Problem II 
will either give a new feasible value of y = yj with *(y t ) < *<yo), or will 
show how to make a basis change in one or more of the subproblems 
(1.1) so that such a value y t can be found. The iteration procedure is 
then continued by solving Problem I with the new value y - y^ The 
solution of Problem II in each iteration is a convergent (but not 
necessarily finite) procedure, and it is shown in Theorem 3 that only a 
finite number of iterations are required. The justification for the 
linearization of constraints in Problem II is given by Theorem 4. 

In Section 4, the gradient projection algorithm is summarized. K is 
also shown there that the general convex programming problem (min- 
imize a convex function in a convex region) can be put in the form (2.1) by 
introducing appropriate linear slack variables. The way in which linear 
equalities can be handled is also described there. The partition pro- 
gramming algorithm is described in detail for the multiblock convex 
problem in the Appendix. 

The partition programming algorithm has been coded for the IBM 7090 
computer and used successfully to solve a number of linear and nonlinear 
problems, including a nonlinear multi-refinery model. This computational 
experience is described elsewhere [11]. Two aspects of this algorithm 
should be emphasized. First, that the size of the subproblems remains the 
same throughout the iterative solution, and in fact that it is never necessary 
to solve a single problem with more than m variables, where 
m = max{mi, s}. Second, that since a feasible solution is obtained at each 
cycle the optimization may be terminated before the global minimum has 
been reached, and still give an improved feasible vector. 

With certain obvious exceptions we use capital Roman letters for 
matrices, lower case Roman letters for vectors and Greek letters for 
scalars. A subscript normally denotes the corresponding block, except 
on y where it denotes a specific vector. 

2. OPTIMALITY CONDITIONS FOR COMPLETE PROBLEM 

A general convex problem will now be stated in the form suitable for 
optimization by the partition programming algorithm. The problem is 
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shown in Fig, 1 and consists of linear submatrices A^ and correspond- 
ing vectors x^, i = !,,..,!> with a block diagonal structure. These sub- 
matrices are coupled through the single coupling vector y, in the following 
sense: the right hand side of each constraint is a convex function of the 
vector y. The constraints for the complete problem may therefore depend 
in a nonlinear way on y, but for any fixed value of y each submatrix is 
linear and independent. The objective function to be minimized is a 
linear function of all the vectors x^, and the minimum is to be obtained 
over all vectors Xi, y, which satisfy the constraints. The complete 
i-block problem may therefore be staged as follows: 

1* I T 1 

min \J^ c]xi Aj x { >bi(y), i 1, . .., i\ (2.1) 

* * 



where the b[(y) are differentiate convex vector functions of the 
s -dimensional vector y. Each A^ is a constant matrix with dimensions 
(mjxkj). Each e^ is a constant m^ -dimensional vector, and the super- 
script ( T * denotes the transpose. A vector inequality means that the in- 
equality applies to every component of the vector. These vectors and 
matrices have the appropriate dimensions shown in Fig. 1. 

The assumption that bj(y) is a convex function of y means that each 
component of the vector bj(y) is a convex function of y. This includes the 
special cases where some or all components of bj are constant and where 
some or all components of b[ depend linearly on y. For the completely 
linear case we have 

biW^bi-Dfr. 1=1. --..I (2.2) 

where the D^ are constant matrices with the dimensions Qqxs). It is 
also assumed that the minimum given by (2.1) is bounded. The problem 
is stated in the form of inequalities so that there will be at least as many 
inequalities (including aonnegativity) as variables in each block, Iq > raj, 
i* 1, . . . , 1. For a completely linear problem this is the natural structure 
for the dual problem. The partition programming optimization algorithm to 
be described In this section is based on this dual structure. 

As a matter of convenience certain additional assumptions will be made 
about the system of inequalities or constraints in (2.1). Any nonnegativity 
requirements on the components of the Xj or y vectors are assumed to be 
included as part of tbe corresponding matrix A? and bi(y) vector. Note 
that for constraints which involve only xi variables the corresponding right 
baad side is coast ant, and for constraints in only the y variables the cor- 
responding rows of the A]* matrix are zero. Finally, two feasibility as- 
sumptions are made. First, that there exist feasible points (jq,y) which 
are interior to all nonlinear constraints, that is, points (x^y) which 
satisfy Aj X| *b|(y), i = 1, ...,!, with a strict inequality for every non- 
linear compoaent of the bi(y). This assumption insures the satisfaction of 
tbe Kiifaa-Tiicker constraint qualification [3]. Second, that those constraints 
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which are completely linear (linear in y as well as the x^> determine a 
bounded region. This can always be accomplished by imposing suitable 
upper and lower bounds on each component of the vectors y and x^. A 
vector y for which there exist vectors X} i - 1, . . . , I, such that Aj x^ 
sbi(y)> i - 1 -.* will be called a feasible vector y. It is also as- 
sumed that a feasible vector y = y is known. Provided it exists, such a 
vector y can be found by a "feasibility solution 1 * which reduces to zero 
a penalty on each constraint violation. 

An important structural aspect of the problem in this dual form should 
be emphasized, since it is basic to the partition programming algorithm. 
For any fixed value of y, the complete problem (2.1) reduces to a set of I 
relatively small linear subproblems, each of which can be solved in- 
dependently of the others. These t subproblems are given by (1.1). The 
objective function *(y) for the complete problem considered as a 
function of y is given by (1.2), so that the original problem (2.1) is now 
equivalent to the minimization problem min *(y) where y is chosen from 



T 

the set of all values which satisfy Aj xj >b|(y), i 1, . . . , 1, for some 
vectors x^. We will now prove a theorem from which it will follow di- 
rectly that (y) is a convex function of y. 

Theorem 1 

Let b(y) be a convex vector function of the vector y. Then 

<p(y) = mm{c T x|A T x >b(y)} (2.3) 

x 

is a convex function of y. Furthermore, the region of definition of cp (y) 
is convex. 

Proof 

We consider two feasible values y i and y 2 of the vector y. Since 
they are feasible, there exist vectors xj and x 2 such that 



min {C T X | A T x ^ 
x 

<p (y 2 ) = min {C T X | A T x > b(y 2 ) } = c T x 2 (2.4) 



For any scalar X, < X < 1, we define y =Xy t + (1 ~X)y 2 , and b s Xb(yj) 
+ (1- A)b(y 2 ). Since b(y) is convex, b(y) <b and it follows that in the 
x-space the convex feasible region A T x >b(y) is not smaller than the 
region A T x >b. Hence 

min {C T X 1 A T x >b(y}} ^ min {C T X | A T x >b} 
x x 

Now 
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* min{c T x|A T x >1 

< min{c T xiA T x >i 
x 



4 <1-A)X 2 ) 

4 (l- 



which proves the convexity of <p(y). The second inequality follows from 
the fact that Ax t 4 (1 - X)x 2 is a feasible solution of A T x >b. This can be 
seen directly from (2.4). 

To show that the region of definition of <p(y) is convex we show that if 
y t and y 2 are feasible, then y = Xyi 4 (1 -X)y 2 , <X <1, is also 
feasible. Since Ji and y 2 are feasible, there exist vectors Xj and x 2 such 
that A 1 ^ ^bfrt) and A T x 2 ^b(y 2 ). Then for x = \x i 4 (1 -X)x 2 , we have 
A T x srb>b(y), Q.E.D. 

This theorem applies directly to the function 3^ (y) defined by (1.1) for 
the I subproblems, so that each of the functions *j (y), i = 1, . . . , i is 
convex. The sum of convex functions is also convex, so that *(y) as given 
by (1.2) is convex. An alternate proof of Theorem 1 can also be given 
based on a known result [1, 6] that <p is a convex function of the right- 
hand side vector b. 

The original multiblock problem (2.1) has been restated as min *{y) 

over the relatively small number of variables in the y vector. Since 
*(y) is convex, it is only necessary to find a minimum of ^(y) to solve 
the original problem, since any minimum is also a global minimum. This 
remark is, unfortunately, deceptively simple, since even the evaluation 
of *(y) over a coarse grid of, say, 10 points in each dimension of the 
s -dimensional y-space would require 10 s linear programming solutions 
of each of the t subproblems (1.1). Furthermore, appropriate constraints 
in the y-space are required to insure that only feasible values of y are 
considered. 

A practical way in which we can use the convexity of V(y) to solve the 
complete problem is based on the following remarks. For a specified 
value, say y = y fl , the solution of the k*h subproblem (1.1) gives not only 
the corresponding vector xfc >0 and function *k(y ) but also the partial 
derivatives SS'jc&^/dyj, j * 1, . . . s, where they exist. These 
derivatives are readily obtained from the subproblem optimal shadow 
prices and the partial derivatives db/8yj. They are valid in the y-space 
region containing y t for which the optimal basis at y remains optimal. 
Furthermore, as long as the same basis is maintained the vector x^ is 
given as an explicit function of y, so that all the constraints can be rep- 
resented explicitly in the y-space. With this information a minimiza- 
tion over y in a region containing y e can be carried out. For each such 
y-space minimization, only a single linear programming solution of each 
of the I subproblems is now required. As will be shown in the next section, 
this sequence of minimizations gives the desired global minimum, where 
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b(y) 



m 
Fig. 2. Single block convex problem. 

the Kuhn-Tucker conditions [5] for the complete problem are satisfied. 
Since the complete problem is solved by solving a sequence of sub- 
problems it is essential that we be able to determine from the subproblem 
solutions when the Kuhn-Tucker conditions for the complete problem are 
satisfied. In order to simplify the discussion we consider only a single 
x -block, so that = 1. The multiblock case is discussed at the end of this 
section. We therefore consider a single block problem in x and y (see 
Fig. 2). 



min{c T x|A T x>b(y)} 



(2.5) 



Corresponding to this is the convex function <p(y) given by (2.3), We con- 
sider a value y = y and let Xg be the corresponding solution of the follow- 
ing linear equation. 



Problem I 



min {c T x[A T x>b(y )}' 



C T X O 



(2.6) 



We will now obtain necessary and sufficient conditions that the point 
(X ,y ) is a minimum point for the complete problem (2.5). These condi- 
tions will be given in terms of quantities obtained from the solution of 
two subproblems: Problem I above, ai>d Problem II below. 

The Kuhn-Tucker conditions that x gives a minimum for Problem I 
are that A T x >b(y )> and that there exists a vector r > 0, such that 



r T lA T x -b(y )] = 
Ar = c 



(2.7) 
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Furthermore, by a basic theorem of linear programming there is an 
(mxm) nonsingular matrix A , whose columns are selected from the 
columns of A, such that all the nonzero components of r are included in 
the m-vector 



(2.8) 



We let b(y) denote the m components of b(y) which correspond to the 
rows of~A T . The corresponding rows of A T x 2:b(y ) are satisfied as 
equalities, so that 



(2.9) 



We also denote the columns of A which are not in A by the matrix B, and 
the corresponding components of b(y) by e(y). This partition of the com- 
plete problem (2.5) is shown in Fig. 3, where it is assumed for purposes of 
illustration that A T consists of the last m rows of A T . We also define the 
mx{k-m) matrix 



A-B 



and the (sxfc) matrix of partial derivatives (negative Jacobian) 



Dfr> -\T 



i - 1, 



s, ] 1, . ... k 



(2.10) 



(2.11) 



The matrix D(y) is also partitioned into two parts, a (sxm) matrix D (y) 
corresponding to b(y) and a sx(k-m) matrix E(y) corresponding to e(y). 
In terms of these quantities we formulate the second problem. 

Problem II 



k- m 



A T 



X 



e(y ) 



Xfl * S. 



m 



Fig. 3, Problem I optimal basis. 
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min {r T b(y) !(y -y ) T [E(y ) ~ D(y )Q]=>e(y ) -Q T b(y )} (2.12) 

This gives the complete problem (2.5) in terms of the y variables only, by 
making the assumption that the m active constraints given by (2.9) de- 
termine the dependence of x on y, 

x= (A-'Fbfcr) (2.13) 

The nonlinear constraints of (2.5) have also been linearized about the 
point y = yo- The objective function of Problem II is seen to be equal to 
C T X by using (2.8) and (2.13). It follows directly from the convexity of 
b(y) and the nonnegativity of r , that r_ T b (y) is a convex function of y. 
Problem II consists therefore of the minimization of a convex function sub- 
ject to linear constraints. 

The Kuhn-Tucker conditions for Problem II at y * y fi are that y ft is 
feasible, that is 

Q T b(y )-e(y ) =>0 (2,14) 
and that a vector v s: exists such that 

v T [Q T b(y )-e(y )] = (2.15) 

[E(y ) -D(y >Q]v = -D(y )r (2.16) 



The last relation follows from the fact that -D(y )r is the gradient of 
r T b (y) at y = y . 

In terms of the partition of the original matrix A into A and B, the 
optimally conditions for the complete problem (2.5) at (x^y^) are the 
feasibility requirements 

A T x c -b(y ) = (2- 

B T x -e(y 8 )>0 (2- 

and that nonnegative vectors u and v exist, such that 



D (y )u 4 E(y )v = < 2 - 

The relation between the subproblem optimal solutions and the com- 
plete problem optimal is given by Theorem 2, 
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Theorem 2 

3>t 3Cfl> as given by (2.6), be the optimal solution to Problem I cor- 
responding to y ft . Then necessary and sufficient conditions that (x ,y ) be 
the optimum of the complete problem are that y is the optimal solution 
to Problem II and that the vector u, given by 

ur-Qv, (2.22) 

ia nonnegative, where v is the shadow price vector at the Problem II 
optimum. 

Proof 

We first prove sufficiency by showing that (2.17) through (2.21) follow 
from the optimally of x and the relations (2.15), (2.16), (2.22) and 
u 5rO. Since x$ is the optimal solution to Problem I the relation (2.9) holds 
as well as A T x$ 5rb(y ). The feasibility requirements (2.17) and (2.18) 
follow directly. The relation (2.19) is shown to be equivalent to (2.15) by 
using (2.9) and (2.10). From (2.22) we have 

Au + AQv = A r 

This, together with (2.8) and (2.10) gives (2.20). Also from (2.22) we have 
D(yo)u -f D(y )Qv ~D(y )r = 



Thia, together with (2.16) gives (2.21). The vector v is nonnegative since 
it is the optimal shadow price vector for Problem II. This completes the 
sufficiency proof. 

To show necessity we assume (2.17) through (2.21) for nonnegative 
vectors u and v, and show that y is the optimal solution of Problem II 
and that (2.22) holds. The two relations (2.14) and (2.15) for Problem II 
follow directly from (2.18) and (2.19), respectively, by the use of (2.9) 
and (2.10). The relation (2.16) is obtained by multiplying (2.20) by 
-D(y fi ) A" 1 , adding to (2.21), and using (2.8) and (2.10). Finally (2.22) is 
obtained by multiplying (2.20) by AT 1 , and using (2.8) and (2.10). Q.E.D. 

Corollary 

A sufficient condition that (Xo,y ) is optimal for the complete problem 
& that y is an interior (unconstrained) minimum for Problem n. 

Proof 

The Problem II shadow price vector v is zero if y is an interior 
minimum, since there are no active constraints at the minimum. By 
(23) we have r>0. Therefore by (2.22), u>0, and (x ,y ) is the 
complete problem minimum. Q.E.D. 
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The theorem is basic to the solution of the complete problem by the 
partition programming algorithm in the following way. The linear 
Problem I is first solved by any dual method for y y e . Either a dual 
simplex method or the gradient projection method is suitable. The 
optimal vector X(j, the optimal basis A, its inverse A"" 1 and corresponding 
shadow price vector r are then available. The partial derivative matrix 
D(y ) its partitions D(y ) and E(y ) and the matrix Q as given by (2.10) 
are also computed. With these quantities we obtain the linear constraints 
and convex objective function for Problem n as given by (2.12). This prob- 
lem is solved by means of the linear constraint version of gradient pro- 
jection for which an efficient computational program is available JlO). The 
optimal value y = y i and corresponding shadow price vector v are obtained 
by gradient projection as described in Theorem 5 in Section 4. If y 4 y e , 
and u as given by (2.22) is nonnegative, the point (x^,y ft ) is the desired 
optimal solution of the complete problem by Theorem 2. On the other 
hand, with an arbitrary starting value y fl , it will generally be the case 
that these conditions will not be satisfied, so that (x$,y ) is not the complete 
problem optimal. In that case we will show that a new value of y can be 
found which decreases *(y). This is discussed in the next section. 

In order to keep the presentation above from becoming too unwieldy we 
have considered the case with only a single block of inequalities, that is 
with i = 1. In general, of course, we will wish to solve multiblock prob- 
lems, so that it is essential that we be able to recognize the optimum for 
the complete multiblock problem in terms of each of the 1 Problem I 
solutions and the single Problem II solution. It can be shown that Theorem 
2 can be generalized to the multiblock case, ensuring that the partition 
programming algorithm is valid for the general problem. In the multiblock 
case the form of Problem I for each block is given by (1.1), for a fixed 
value of y. The way in which Problem II is defined for the multiblock case 
is described in Section 4. It is shown there how the information available 
from each of the I Problem I optimal solutions is used to form a single 
Problem II in the s-dimensional y-space. The number of constraints in 
Problem n will depend on the number of blocks and the number of con- 
straints in each block, but the number of variables in Problem II is always 
the same as the number of y variables in the complete problem. Since 
computational time is determined primarily by the problem dimensionality 
(gradient projection being a dual algorithm), the time required to solve 
Problem II depends primarily on s and only in a secondary way on the 
number of blocks. 



3. ITERATIVE SOLUTION 

We will describe the iterative solution of Problems I and II by which the 
complete problem is solved. For simplicity we again consider the case 
1=1, with only a single x-block as given by (2.5) and sbown in Fig. 2. 
The actual multiblock algorithm is given in the next section, a&d it can 
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be shown that the results given here are also valid for the multiblock 
problem (2.1). 

An important aspect of the iterative procedure is the possibility of 
alternate optimal bases in Problem I. There may be alternate optimal 
bases corresponding to a point (xo,y ) when A T x >b(y ), with at least 
m-M of these inequalities satisfied as equalities. If there exists more than 
one nonsingular (mxm) matrix A T with rows selected from the set of 
equalities, such that A" 1 c 2:0, then each such matrix corresponds to an 
alternate optimal basis. The particular optimal basis which is selected in 
Problem I will depend on the solution method and the choice of initial basis 
or starting value of x. There are, of course, only a finite number of 
possible alternate optimal bases. 

We can now summarize the iterative solution procedure in terms of a 
typical cycle, the j th cycle. At the start of the j tn cycle we have a 
feasible vector yj, and corresponding value <p(yp of the convex function 
<p(y} as given by (2.3). The corresponding linear Problem I is solved, 
giving the vector Xj, optimal basis A, and shadow price vector r ^0. 
The corresponding Problem n, with a convex objective function and linear 
constraints, is formulated and its global minimum obtained. This gives a 
feasible and optimal vector yj + i the value of the objective function 
#(yj+i> *&& fck corresponding Problem II shadow price vector v. Note 
that the solution of the convex Problem II is not necessarily a finite pro- 
cedure* The feasibility of yj+i follows from the fact that it satisfies 
Q T b(yj+i) ~" e (yj+i) 5:0, as shown by Theorem 4. If yj + 1 = yj and u as 
given by (2.22) is nonnegative, we have the desired optimal solution to the 
complete problem. If yj+i = yj and u has at least one negative com- 
ponent, we formulate and solve Problem II for each alternate Problem I 
basis corresponding to yj. If <p(yj+i) < <p(yj), we go on to the next cycle 
with y = yj +1 . 

Theorem 3 

This iterative procedure will find the complete problem minimum in a 
finite number of cycles. 

Proof 

ft follows from (2.3) that the desired convex function <p(y) is given by 
^(y) = b T (y) A^c, where A is some Problem I optimal basis, whose choice 
depends on y. Each Problem H corresponds to a minimization over the 
feasible y -space for a particular selection of such an optimal basis. For 
each such election the minimum over the feasible y-space is obtained. If 
w consider the jth cycle, we start with yj and obtain the optimal vector 
Xj * (A-^bfrj) which satisfies B T Xj >e (yj), or using (2.10), 
*<?)) ~Q T b<yj) ^0, so that yj is feasible for Problem n. ft follows that 
tbe optimal vector yj +1 , for Problem H, satisfies <??(yj +1 ) ^<p(yj). The 
desired function $>(yh Ui the neighborhood of yj, is given by at least one 
of the alternate Problem I optimal bases. Trying each of these in turn, we 
will find a basis for which either the complete problem optimum conditions 
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are satisfied [in which case <p(yj) is the complete problem minimum], or a 
value yj+i will be found such that (pfyj + i) < <^(yj). In this latter case, the 
basis which gives yj+ t must be different from the basis chosen in any 
previous cycle. This is a consequence of the fact that each function 
#(yi) i = 0, . . . , j is the minimum for the optimal basis selected in that 
cycle, and <p(yj+i) < #(yi) i = 0, . . . , j. Since there are only a finite 
number of possible Problem I bases, the number of cycles is finite. Q.E.D. 

Two points related to this iterative solution should be emphasized. The 
first is that even when alternate optimal bases exist it is only necessary to 
solve Problem II for a relatively small number of these bases (often only 
one or two) in order to obtain a decrease in the objective function. A 
specific selection procedure is used to choose an alternate optimal basis 
and modify the inverse A"" 1 by a single change of basis to get the desired 
new basis inverse. The selection is based on the most negative component 
of the vector u, and is described in the partition programming algorithm in 
the next section. The second point is that for a nonlinear vector b(y) the 
Problem II will usually also be nonlinear and must be solved by a con- 
vergent (but in general not finite) procedure. The gradient projection 
method is very well suited to this purpose and is used in the way described 
in the next section. Any other method suitable for minimizing a convex 
function subject to linear constraints could also be used. 

The justification for the constraint linearization in Problem H is based 
on the following theorem which shows that if y is not the optimal solution 
to the linearized Problem II, then a new feasible vector y can be found such 
that q?(y) < <p(y )- We again consider the single x-block problem given by 
(2.5), the convex function <p(y) given by (2.3) and Problems I and II given 
by (2.6) and (2.12). 

Theorem 4 

Let the optimal vector yj for Problem II be such that 

r T b(y!) <r T b(y ) (3.1) 

Then if 

h(y) = Q T b(y)-e(y) (3.2) 

is convex, we have 



For h(y) not necessarily convex, let O m be the scalar solution of the one- 
dimensional maximization 



0, *$*l} (3.4) 

Then for m > 0, and y y % 4 m (yt ~y) we have <p(y} < 
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Proof 

It follows from (2,10) and (3.2) that a vector y and the corresponding 
value of x (A" 1 ) 1 "*) <y) are feasible for the complete problem (2.5) if, and 
only if, h(y) 2:0. Since x = (A'Vbfyo) is the optimal vector for Problem I 
with y * y t we have h(y ) ^0, and r T b(y ) = <?<y ). For h(y) convex 

h<y) s h(y ) + (y - yo) T [E(y ) - D (y )Ql (3.5) 

since by (2.11) and (3.2), the Jacobian of h(y) at y is [E(y ) -D(y )Ql. 
Since yj is feasible for Problem II, we have that the right-hand side of 
(3.5) is nonnegative. Therefore, h(yj) ^0. It follows that xj = (A^^bty 
and yi are feasible for the complete problem, with c^Xj = r T b (y t ). The 
minimum solution ^(y,) for y = y t must therefore satisfy <p(y t ) ^rjbf 
which gives (3.3). 

From (3.4) we have h(y) 5:0, for y = y + m (yi ~yo>- Tne correspond- 
ing value of C T X is r T b(y), so that q>(y) ^r T b(y). Since r T b(y) is 
convex 

r T b{y) s r T b (y e ) + m [ r T b (yj) - r T b (y )l < r_ T b (y ) (3.6) 



for e m > 0. Then, <p(y) ^r T b(y ) < r T b (y ) = <p(y ). Q.E.D. 

For a completely linear problem, where b(y) is linear, it should be 
noted that the Problem II constraints are identical to those for the com- 
plete problem, ft follows that for the linear problem we always have 
<P<VI) = T b(yi) < (p(yo) whenever r T b(y!) < r_ T b(y ). 

On the basis of Theorem 4, we observe that all points (Xj,yj) obtained 
during the iterative procedure are feasible. Thus the procedure may be 
terminated at any cycle with a new feasible point which gives a lower 
value of the objective function than the starting point, 



4. COMPUTATIONAL ALGORITHM 

The computational program for partition programming uses the gradient 
projection (GP) method to solve both the completely linear Problem I and 
the linear constraint, convex function, Problem II. We will summarize the 
use of GP to solve a linear constraint problem by considering 

b} (4.1) 

wbare p(x) is a differentiate convex function of the m-dimensional vector 
x with the gradient g(x), A is an m by k matrix and b is a k-dimensional 
vector . 

Theorem 5 

Let x$ satisfy A 7 ^ b. If x$ satisfies the Kuhn-Tucker conditions for 
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the problem (3.1) then GP will demonstrate this fact by finding a k-dimen- 
sional vector r 2=0, such that Ar = gffy) and r T [Ax$ - b] = 0. The q, q ^m, 
non~zero components of r are given by the vector 

r = A-*g(Xo) 5:0 (4.2) 

where the k by q matrix A consists of a set of Q linearly Independent 
columns of A which are selected by GP, and A"* g f A T Ar*A Further* 
more, the matrix A is such that x = (A~* ) T b where b consists of the q 
components of b corresponding to the selected columns of A (or rows of 

ATjT 

If the point x does not satisfy the Kuhn -Tucker conditions then GP will 
find another feasible point which satisfies these conditions, along with the 
corresponding matrix A, its pseudoinverse A""* and the nonnegaiive 
vector T. 

The proof that GP will recognize the optimum (first paragraph of 
theorem) has been given by Mangasarian [7]. The convergence to the 
optimum (second paragraph of theorem) has been proved in Part I of the 
GP paper [10]. 

The way in which the GP optimization of appropriate subproblems is 
used to achieve an iterative solution of the complete multibiock problem 
(2.1) is given in the Appendix to this paper. It is assumed that an initial 
feasible vector y is known. 

We now show how the general convex problem of minimizing a convex 
function subject to convex constraints can be put in the form (2.1). We 
consider the convex problem in the form 

min{*(y)lXi(y) <0, i = 1, . . . , q} (4.3) 

We introduce q -t 1 slack variables x t , i = 1, . . . , q 4 1. Then an 
equivalent problem in the form (2,1) is given by 



min 



x q+i 



(y), i* 1 q 

i B 1, . ... q 

*(y> 



(4.4) 



If there are linear equalities in the xi and y they can also be handled 
without difficulty. If such equalities occur in the i th block they are handled 
by always including in the Problem I optimal basis A those rows of A^ 
which correspond to the equality constraints. 
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APPENDIX 

Partition Programming Algorithm 

1. An optimal solution to each of the linear subproblems (1.1) with 
y * y , is obtained using GP or an LP algorithm. This gives ^(VQ), the 
optimal vector xi c and basis Aj for each subproblem, i = 1, ...,. The 
basis Aj consists of m^ linearly independent columns selected by GP 
from the original matrix A}. The corresponding optimal shadow price 
vector is given by 

i" A" 2 Ci 2:0 

The columns of A[ which are not in A j are denoted by B. The mi com- 
ponents of b[(y) which correspond to the optimal basis A [ will be denoted 
by a vector b j(y) and the remaining (kj - m^) components by a vector 
6j(y). The complete problem objective function is then given by 



2. A convex problem in the y variables with linearized constraints is 
now formed, as follows. For each block, the matrices D<y ), i * 1, ..,!, 
are calculated as given by (2.11) for the single block case. Each matrix 
D|(y e ) is partitioned into Dj(y ) and Ej(yo) corresponding to the vectors 
b|(yo) and e(yo). A matrix Qj is also obtained for each block according to 



We now have the convex Problem II, with s variables and 
linear constraints 



)> 1-1. ....1} 

where *(y) s ^ r i b { (y), and hjty) s Qj b .(y) - e^), i 1, ...,. 



3. Problem II is solved by GP, as summarized in Theorem 5 giving an 
optimal vector y y if and the corresponding value of the objective 
foactioa *&*). Since y t is feasible for Problem II and 



I 

E C 

we have 

(Al) 



CONVEX PARTITION PROGRAMMING 175 

The solution also gives the shadow price vector v ^ 0, with no more than 
s positive components. We let V| be the vector whose (kj - m^) com- 
ponents are the components of v corresponding to the i tjl sub matrix Aj. 
We define 

u i = Li ~Qi v i' i = 1> .... 

4. There are four possibilities: 

a. yi - Yo ancl U| 0, i = 1 . . . . By Theorem 2 the complete problem 
optimal vector is (x 10 , x 20 , ..., Xo,y ) and *,(yo) is tn desired minimum 
value of the objective function. The shadow prices for the complete prob- 
lem (one for each constraint) are given according to the submatrix in 
which the constraint occurs. For the i th submatrix, the vector m gives 
the shadow prices for the m[ constraints corresponding to Aj. The vector 
vj gives the shadow prices for the remaining (lq - mj) constraints, 

b. yi = YO and at least one negative component of u^ for at least one 
submatrix. For each submatrix we choose the most negative component 
of Uj, if any. The corresponding row of A J is replaced with a row of B[ 
for which the corresponding component of Vj is positive. This gives a new 
nonsingular basis Aj . There will be at least one such selection from B[ 
for which Aj C A ^S7 that is, A.J is an optimal basis for the i** 1 sub- 
matrix Problem I. Using thislbasis we form a new Problem II and continue 
the iteration with (3) above. 

c. jKyt) < *(y ) and 11^) > 0, i a 1, ... I. We have *(y t ) < J^t), so 
that *(yi) < *(y ) by (Al). This will always be the case for the completely 
linear problem where bj(y) is given by (2.2). We continue the iteration 
with (2) above with y 4 replacing y fl . 

d. Jrtyj) < *(y ft ) and at least one of the inequalities h^) =0, 
i = 1, . . . , I, not satisfied. Let 

yi)) s <> i* 1. .... 1} 



For B m > 0, and y - y + ^ m {yi -y t ) we have_*(y ) < *<y) by Theorem 4. 
We continue the iteration with (2) above with y replacing y f . If m = 0, we 
must have at least one component, say hy(y) the j th component of the 
vector hi(y), such that hy(y e ) 0, and hy^) < 0. The correspooding row 
of B^ is exchanged with an appropriate row of A t to give a new optimal 
basis A]\ Using this new basis we form a new Problem II, and continue 
the iteration with (3) above. 

This takes into account the possible alternatives during a cycle. A 
typical cycle starts with a feasible vector y , and ends with a feasible 
vector yt with *(yi) < *(y ) as given by 4e, or a feasible vector y with 
*(y ) < *(yo) as given by 4d. As shown in Theorem 3, after a finite number 
of such cycles the conditions of 4a will be satisfied and the complete prob- 
lem optimum has been obtained. 
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Experiments in Linear Programming 



Philip Wolfe 
Leola Cutler 

INTRODUCTION 

There are many ways to solve linear programming problems. The 
earliest of these, Dantzig's "simplex method" [2], is the most widely 
used, and no equally effective alternative is available. Many variations 
of the original simplex method have been proposed in the last few years. 
Computational experience seems to us the only way to properly compare 
the computational efficiencies of these variations; their behavior depends 
so strongly on features of the process which cannot be known in advance 
that a priori estimates of their effectiveness inspire little confidence. The 
purpose of the work reported here has been to compare some of the out- 
standing variations with each other in their work on actual linear pro- 
gramming problems, and to set some bench marks against which other 
procedures may be measured. 

Under the title of ' 'SC EM P" Standardized Computational Experiments 
in Mathematical Programming this work originated in 1960 at a meeting 
of the Linear Programming Committee of the SHARE organization, when 
it was suggested that some of the flexible linear programming routines 
then forthcoming might serve the task of evaluating the alternative pro- 
cedures that had been discussed there. The Committee maintains a file of 
test problems from which those used here were selected; they are de- 
scribed in detail in the next section. A set of statistic -collect ing routines, 
modelled on an all-in-core, FORTRAN-coded linear programming routine 
for the IBM 704 and 7090 [12], was coded and served as the basis for the 
computer routines used in the present tests. (The routines and the output 
of the tests have been retained and can be made available, but the routines 
are not recommended for general purposes.) 

The nature of the output of these routines has been given in detail 
elsewhere [13]. Briefly, it consists in the following quantities for each 
simplex method iteration: the amount of infeasibility; the current value 
of the objective; the pivot row and column; the determinant of the basis; 
the number of product -form transformation entries; the number of 
arithmetic operations performed in each of several major subdivisions of 



fThis research was sponsored by U.S. Air Force Project RAND. It 
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an iteration; and the number of nonzero elements in certain arrays of 
interest. (The terms used here are defined in Sections 3 and 8.) At the 
end of a problem the complete solutions are given as well as the 
"errors' 5 the extent to which the final solution fails of being both primal 
and dual feasible. All solutions obtained have been checked with those ob- 
tained by other routines on the same problems, and the statistic -collect ing 
features have been checked in detail for most of the runs by hand calcula- 
tion of a small problem [10]. 

The experimental data are organized by "runs," each of which consists 
in the solution of an entire set of test problems by means of a routine em- 
bodying a particular algorithm variation. Of the 47 runs done so far in the 
SCEMP project, 29 furnish the data used in this report; the others bear on 
matters not discussed here. Two kinds of data pertaining to a run have 
been used in this report: we consider the number of simplex method itera- 
tions, or changes of basis, required to reach a certain end either the 
first feasible solution or the optimal solution of the problem in Sections 
4-7; and we discuss the total number of arithmetic operations required in 
Sections 8 and 9. The Appendix lists the raw data from which the figures 
presented in the sequel have been calculated. 

Since the point of most of these experiments has been to compare 
alternative methods, the following general format has been used for the 
results. The appropriate data (e.g., number of iterations) for a particular 
run are chosen as a base. In order to compare another run with the base, 
the corresponding datum obtained in the comparison run for each of the 
test problems is divided by the corresponding datum for the base run; the 
resulting ratio is the proportion in which the measure has been reduced by 
use of the compared procedure. For example, suppose that Algorithm I 
took 20 iterations to solve problem ID and 30 to solve problem 2A; and 
that Algorithm II took 14 and 24 iterations, respectively. Choosing 
Algorithm I as the base, the comparative results would be given as in 
Table 1-1. 

Table 1-1 
ALGORITHM II COMPARED WITH ALGORITHM I 

Problem ID 2A avg. c.v. 

Alg. II .70 .80 .75 .07 

Note that usually the average of the ratios is given, as well as their co- 
efficient of variation (the standard deviation divided by the average) . The 
problems will be listed in order of their numbers of constraints. The ratios 
all have equal weights in the averaging, but the average could be viewed as 
an average of the data of the compared run weighted by the reciprocals of 
the corresponding data of the base run. For this reason, the average is a 
somewhat fairer measure when the data of the base are larger than those 
of the compared run. Owing to the arithmetic of averaging, if II were 
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chosen as the base and I as the compared run, the resulting average would 
be greater than the reciprocal of that of 1-1. 

Some gaps appear in the tables that follow. The largest problem cannot 
be run on the routines using the standard form of the simplex method, and 
two other problems were omitted from some "feasible solution" runs be- 
cause they had starting feasible solutions. 

A good deal of special terminology is used in describing the computa- 
tions. Special terms are usually defined in context at their first ap- 
pearance, which is signaled by underlining. Most of them are introduced 
in Sections 3 and 8. While the terms "method" and "procedure" are used 
interchangeably in a very general way, we use the term algorithm to refer 
to any particular version of the simplex method which chooses the pivot 
column and the pivot row in a particular manner, regardless of the way in 
which the data used in making the choice are obtained. Thus Sections 3 to 7 
study only algorithms and the data principally iteration counts associated 
with them, while the remaining sections study, in part, different methods 
of performing the same algorithm. 

We are indebted to many people for assistance with SCEMP. Marvin 
Shapiro and Richard Clasen at RAND did a substantial part of the computer 
programming. Many members of the SHARE Linear Programming Project, 
notably David M. Smith and L. Wheaton Smith, offered valuable advice. 
Much of the computing labor was defrayed through generous donations of 
time by C-E-I-R, Inc., Esso Research and Engineering Co., Phillips 
Petroleum Co., Shell Oil Co., Socony -Mobil Oil Co., and Standard Oil Co. 
of California. 



2. THE PROBLEMS 

The linear programming problems on which our experiments were con- 
ducted were drawn from the file of thirteen problems maintained by the 
Test Problems and Experiments Committee of the SHARE Linear Pro- 
gramming Project. The problems, submitted by various members of the 
Committee in 1959 and 1960, were all used as production problems in their 
businesses; the majority arose in oil refining studies. None were es- 
pecially constructed for test purposes, or thought "pathological." The 
original problems are available through the Committee. 

Four of the problems were not used here. Problem 1C is too small, 4A 
is too large, and 3 A and 3B had awkward input features. Thus our work 
was done with the nine problems of Table 2-1. 

Throughout this report the problems are listed in the order of their 
numbers of constraints. In Table 2-1, the "name" identifies a problem in 
the Committee's files. All the problems are formulated as problems of 
minimizing a linear objective function under linear equality constraints. 
The objective functions are entered as rows of data, as are the constraints; 
some problems have several alternative objective functions, so that there 
are always one or more "additional rows." In our runs the highest 
numbered objective row was used and the remainder ignored. 
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Table 2-1 
THE TEST PROBLEMS 

Name ID 2A IE 1A 5A 1G IF 2B IB 

Number of 

constraints: M 27 30 31 33 34 48 66 96 117 

Number of additional 

rows: K 128111131 

Number of variables 45 103 106 64 78 102 135 162 253 
Number of entries 252 811 855 245 391 462 644 897 1210 



The "number of variables" includes all the variables of the problem 
but no "artificial" variables. The "number of entries" is the number of 
nonzero quantities appearing among the constraints and objectives. Some 
further data regarding starting bases for the problems are given in 
Sect ion 4. 



3. TERMINOLOGY 

In order to describe the algorithms studied, we develop here some of the 
terminology connected with the simplex method. It is not intended to dis- 
cuss the procedure itself, which is done in many standard works [5, 6]. 
The discussion in this section is entirely in terms of the standard form of 
the simplex method; the other forms are dealt with in Section 8. 

Let a linear programming problem have N variables x lf . . . , XN and M 
equation constraints. At any stage in the simplex method solution there is 
defined a basis, which is a set of M basic variables, say xj t , . . . , xj^; let 
the remaining variables be XJ M+I , . . ., XJ N . The current tableau is the set 
of coefficients of the linear equations 

+ "-+a I , N _M x JN = bl 



X J M + a Mi X J M+1 + ' * ' + a M,N-M 



which are uniquely defined by the current basis and the requirement that 
this set be equivalent to the linear equations defining the original problem. 
We say that the basic variable xj i occupies position i in the basis for 
1 = 1, ..., M. 

It is further supposed that the objective function to be minimized is ex- 
pressed at this time in terms of the nonbasic variables as 
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the coefficients Cj are the reduced costs. (The quantities ay and Cj de- 
fined here commonly carry a superior bar to indicate that they change in 
each iteration; we omit the bar.) The basic solution of the equations above 
is obtained by setting all nonbasic variables to zero, giving the basic 
variables the values Xj 1 = b t , etc., and the objective the value z . 

In a single iteration of the simplex method a pivot column J (where 
JM+J is tlie m( *ex of a nonbasic variable) and a pivot row I of the tableau 
are chosen, and the roles of the basic variable occupying position I and 
the nonbasic variable associated with column J are interchanged, new 
data of the form of the equations above being obtained by pivoting on the 
entry ay of the tableau. The value of the objective changes by the amount 

c J b !/ a U- 

Some M variables must be chosen as the starting basis for the pro- 
cedure. If not all these variables belong to those of the original problem, 
the remainder are artificial. Each instance of a basic variable whose 
current value is negative, or of an artificial variable whose value is not 
zero, is an infeasibility. A basic solution having no infeasibilities is 
feasible. When it is necessary to adjoin artificial variables in order to 
have a starting basis, we always adjoin for each a column of coefficients 
of the form [0, .... 1, .... 0] to the original problem, the single "1" of 
the column lying in a row corresponding to an otherwise unoccupied 
position of the basis. When there are infeasibilities, a separate objective 
function involving them is defined, and it is required that this infeasibility 
objective be minimized. The process of minimizing that objective is 
Phase One; the subsequent minimization of the proper objective, once a 
feasible solution is obtained, is Phase Two. 

In the sequel we refer to the ordinary simplex method, by which we 
mean the simplex method as most commonly presented, except that we 
extend the usual procedure for the choice of pivot row to that of the 
"composite algorithm" [3]. 

In Phase Two the procedure is quite ordinary. A pivot column J is 
chosen so that cj is minimal (if all are nonnegative, the current solution is 
optimal) . Then the pivot row I is chosen so that after pivoting the current 
solution will still be nonnegative: I is the i which minimizes bi/ay for 
all au > 0. If bj = degeneracy should happen, then I is chosen as the 

1 maximizing a^j among all i for which b^ = 0. (This rule is not known to 
prevent "cycling," but is very effective in practice [15].) 

In Phase One the objective is defined as the sum of the infeasibilities: 

2 {xj |xj < or Xj artificial}. The reduced cost for the nonbasic variable 
j is then 2 {ay | t>i < } - I {ay | bi > and position i artificial }. The 
pivot column J is chosen for minimal reduced cost, and the pivot row I 
so that no variable nonnegative in the current solution becomes negative 
after pivoting: I is the i which achieves the smaller of the two ratios 
Mini {bi/au Ibj, a^j > 0}, Maxi {bi/aij | b^ aij < 0}. A somewhat more 
complicated rule is needed for degeneracy. In the absence of negative b^ 
the pivot-row rule operates just as in Phase Two. 
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4. STARTING BASES 

The basis with which a problem is started naturally has a great in- 
fluence on the number of iterations required to solve it. In practice one 
often attempts to guess a starting basis which will be as nearly feasible 
and optimal as possible; a sophisticated routine will make good use of 
such a guess even if the basis is incomplete, infeasible, or singular. The 
three methods studied here do not, of course, make any use of special in- 
formation about the problem; they assume complete ignorance, and may 
be used with any problem. 

N basis: When no starting basis is specified, a full set of M artificial 
variables is adjoined to the problem and constitutes the starting basis. 

S basis: By singleton we mean a variable having only one nonzero entry, 
and that positive, in the equations of the initial tableau. An S basis is a 
starting basis consisting of a maximal set of singletons, with artificial 
variables used as necessary for the unfilled positions. (The computational 
cost of pivoting on singletons is almost nothing, and feasibility is improved 
if all the original right-hand sides are nonnegative.) 

F basis: A full basis was produced by this procedure: first, an S basis 
was chosen; subsequently, each column of the tableau was examined, and 
pivoted into the basis if it had a nonzero entry corresponding to any unfilled 
position. The only basis positions left unfilled by this procedure are those 
corresponding to redundant constraints. Naturally, the resulting basis is 
not likely to be primal or dual feasible. Other procedures for obtaining a 
full basis have been tried but not yet fully evaluated; they do not seem to 
offer much advantage over the above. 

Table 4-1 describes the bases resulting from the use of procedures S 
and F. All the data are proportions, the number of variables in a given 
category being divided by the number of constraints in the problem. The 
last two lines constitute the proportion of infeasibilities in the starting 
basis. Note, however, that artificial variables initially at zero level tend 
to become nonzero before they are eliminated, so that for an S basis the 
total number of artificials is the better measure of infeasibility. 

Table 4-1 
STARTING BASIS CHARACTERISTICS 

Problem ID 2A IE 1A 5A 1G IF 2B IB 

(S basis) 

Singletons used .19 .87 .19 .30 1.00 .90 .65 .86 29 

Positive artificials .81 .00 .19 .70 .00 .10 .35 .05 .*60 
Zero artificials .00 .13 .61 .00 .00 .00 .00 .'08 !ll 

(F basis) 
Negative variables .41 .00 .42 .30 .00 .23 .26 .10 .46 
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Note that the proportion of infeasibilities in the F basis runs a little more 
than half the proportion in the S basis. We view this as accounting for the 
advantage, to be seen below, of the F basis over the S basis. 

We are mainly interested in the number of simplex method iterations 
required to obtain the first feasible solution after the starting basis has 
been constructed. (In Section 9 the effect of the work required to produce 
the starting basis, as included in the total work to solve the problem, is 
considered.) Bases N and S have been used with two algorithms: the 
ordinary procedure and the "ratio pricing" procedure, described in 
Section 6. The results are summarized in Table 4-2. The first line 
compares basis S (run 21) with basis N (run 5, used as the base), for the 
ordinary algorithm; the second line compares basis S (run 6) with basis N 
(run 8, used as base) for the ratio pricing algorithm. The ratios thus rep- 
resent the proportion in which the number of iterations in Phase One is 
decreased by using an S basis rather than an N basis. 

Table 4-2 
S BASIS COMPARED TO N BASIS FOR TWO ALGORITHMS 

Problem ID 2A IE 1A 5A 1G IF 2B IB avg. 

Ordinary alg. .89 .00 .69 .61 .00 .51 .42 .83 .50 

Ratio pricing .86 .01 .85 .82 .00 .40 .58 .83 .54 

Evidently use of an S basis entails, on the average, a saving of about 48% 
in the number of iterations required for Phase One. 

Comparison of bases S and F has been made in each of three algo- 
rithms, with the number of iterations for basis S taken as the base data: 
the ordinary algorithm (runs 21 and 39, respectively); the sequential pro- 
cedure (runs 31 and 36); and the least-infeasibility procedure (runs 33 
and 37). The last two procedures are discussed in Section 6. Table 4-3 
summarizes these, omitting problems 2A and 5A because their starting S 
and F bases are feasible. 

Table 4-3 
F BASIS COMPARED TO S BASIS 

Problem ID IE 1A 1G IF 2B IB avg. 



Ordinary alg. 


.32 


.71 


.48 


1.62 


.05 


.69 


.53 


.63 


Sequential 


.37 


1.00 


.37 


2.43 


.11 


1.07 


.55 


.84 


Least infeas. 


.33 


.94 


.35 


2.27 


.09 


.93 


.78 


.81 



The over-all average of these proportions is 0.76, predicting a saving of 
in use of an F basis rather than an S basis. 
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We conclude that in the absence of other knowledge of the problem, an 
F basis should be used. Some linear programming routines [1] make it 
possible to use a mixed procedure, entering a known partial basis and sub- 
sequently completing It in an arbitrary manner. 

W T e may try to predict the number of iterations Phase One requires 
using the ordinary algorithm. Each entry in Table 4-4 is obtained by 
averaging, for all problems, the number of iterations taken using the basis 
N, S, or F divided by one of three possible measures of problem difficulty 
the number, M, of constraints, the number of nonsingletons, or the number 
of infeasibilities in an F basis. Thus, for example, the number of itera- 
tions required using an S basis is expected to be 0.78 M. The coefficients 
of variation are given in parentheses. It is disappointing that the number 
of constraints is a better basis for prediction than the more informative 
measures. 

Table 4-4 

PHASE ONE ITERATIONS VERSUS MEASURES OF 
PROBLEM DIFFICULTY 

Measure 

Number of Number of negatives 

M nonsingletons in F basis 

N 1.69 (.3) 

F .56 (.6) 2.07 (1.1) 2.12 (.8) 

5. THE FEASIBLE SOLUTION 

In general Phase One, the task of obtaining a first feasible solution, is 
accomplished by employing the simplex method to minimize some measure 
of the infeasibility of a solution. The five procedures studied here employ 
four different measures of infeasibility. In all of them the measure con- 
stitutes an objective function whose reduced costs are calculated so that 
the choice of pivot column can be made by the ordinary rule. In all but 
the "extended composite" algorithm the ordinary rule of pivot row se- 
lection is used. 

The ordinary procedure is described in Section 3. 

The extended composite procedure [14] differs from the ordinary in 
choice of pivot row. After the pivot column has been chosen in the 
ordinary way, the pivot row is selected so that the sum of infeasibilities 
after pivoting will be minimized; variables are allowed to change sign 
freely. Thus I is defined by = bj/ajj, where minimizes 
2?i {| bi - a itF | j bi - ay is infeasible }. 

In the sequential procedure, the infeasibility is corrected one component 
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at a time, in order. At any iteration, let i be the least i for which some 
bj is infeasible, and x r the corresponding variable. The objective for 
minimization is defined as x r if position i is artificial and bi Q is pos- 
itive, or as -x r if bi Q is negative. (The reduced cost for column j will 
then be just ai or -a^..) During the procedure, row i will be made 
feasible, feasibility on the previous rows being preserved. 

The least-infeasibility procedure is like the sequential, except that at 
each iteration the index IQ is taken so that x r is minimal among all in- 
feasible variables; the index may increase or decrease. 

The fudge procedure, but not its name, is due to Gass [5, pp. 120-125]. 
A problem having negative solution values is augmented by a single 
artificial variable and subjected to a transformation yielding nonnegative 
solutions for the augmented problem. Specifically, the tableau is augmented 
by a column containing the entry -1 in each row having b^ < and zeros 
elsewhere; and the desired tableau is obtained by pivoting on the Ith entry 
of the added column, where bj = mini b[. Subsequently the sum of all the 
artificial variables is minimized using the ordinary algorithm; when it has 
been reduced to zero, a feasible solution is at hand. (Of course other 
means of getting feasible could be used once the negativity has been re- 
moved.) 

Table 5-1 lists the runs done using these five procedures, indicated at 
the left. The starting basis used is listed at the top. 

Table 5-1 

FEASIBLE SOLUTION RUNS 

N S F 

Ordinary algorithm 5 21 39 

Extended composite algorithm (5) (21) 38 

Sequential algorithm 31 36 

Least-infeasibility algorithm 33 37 

Fudge procedure (5) (21) 32 

Runs indicated in parentheses were not done, since the same results would 
have been obtained as in the run whose number is given. 

Tables 5-2 and 5-3 give the results for these procedures, for bases S 
and F, relative to the ordinary procedure. The last line of each table is 
the proportion of infeasibilities in the starting basis for each problem. 
The amount of infeasibility does not seem to affect the relative ef- 
ficiencies of these methods much, although it does affect the total work 
done, as the data for runs 21, 31, and 37 in the Appendix, or those of 
Table 4-4, show. 

The results pretty well establish the ordinary procedure as superior 
in getting feasible. Its objective is responsible, since the minimization 
algorithm is the same in all the runs. It seems that by moving in a direc- 
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Table 5-2 
PHASE ONE, S BASIS, RELATIVE TO ORDINARY ALGORITHM 

Problem ID IE 1A 1G IF 2B IB avg. 

Sequential .96 .94 1.52 1.00 .98 .97 1.04 1.06 

Least-infeasibility 1.08 1.03 1.36 1.05 1.24 1.12 .94 1.12 

Proportion infeas. .81 .19 .70 .10 .35 .05 .60 

Table 5-3 
PHASE ONE, F BASIS, RELATIVE TO ORDINARY ALGORITHM 

Problem ID IE 1A 1G IF 2B IB avg. 

Extended composite 1.00 1.08 1.42 .65 1.00 1.04 1.01 1.03 

Sequential 1.12 1.33 1.17 1.50 2.00 1.51 1.08 1.39 

Least-infeas. 1.12 1.37 1.00 1.47 2.00 1.51 1.39 1.41 

Fudge 1.25 1.08 1.25 1.00 2.00 1.34 1.01 1.27 

Proportion infeas. .41 .42 .30 .23 .26 .10 .46 



tion tending to minimize the sum of all the infeasibilities we give more 
chance to a number of infeasibilities to leave, while the sequential and 
l eas t -feasible procedures, concentrating on a single variable at a time, 
are too single-minded. Since several negative infeasibilities can be re- 
moved in one iteration, while only one artificial variable can, it is 
reasonable that the difference is more decisive for F bases than for S 
bases. 

The extended composite procedure is somewhat disappointing. It might 
work better if, at the expense of considerably more calculation, the pivot 
column were chosen by the same criterion as is the pivot row. 



6. THE OPTIMAL SOLUTION 

Of greatest interest to the ordinary user is the amount of work required 
to solve a complete problem. In this section six algorithms are compared 
in the number of iterations required to obtain an optimal solution. All but 
one of these are designed to handle artificial variables; for them, the 
ordinary Phase One objective the sum of all infeasibilities is used; this 
was found most efficient in Section 5. Unless otherwise noted, each pro- 
cedure uses the same method for minimizing its objective in Phase One as 
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it does in Phase Two, only the definition of the objective changing between 
the phases. Similarly, each procedure (except the "symmetric") uses the 
ordinary choice of pivot row. They differ primarily in the manner of 
choosing the pivot column. 

The ordinary procedure was described in Section 3. 

The po s it ive -no r maliz ed procedures (PN1 and PN2) can be viewed as 
representative of those proposals which aim at eliminating the effects of 
bad scaling of the problem data by dividing the reduced costs, used in 
choosing the pivot column, by some combination of the coefficients ay. 
The first of the two considered here, proposed by Dickson and Frederick 
[4], uses the formula dj = c| /(cj + ^ i ay 2 ), where ay is the "positive 
part" of ay, choosing the pivot column as that j for which dj is maximal 
for Cj < 0. The procedure PN2 is essentially this, using instead the 
formula dj = c| /Sj ay 2 , which gives the same result. 

The PN1 procedure employs the slightly simpler formula dj = GJ/SJ a~, 
with the pivot column chosen for minimal dj . 

The greatest -change procedure was described long ago, but has been 
little used. That column is chosen which, after pivoting, will give the 
greatest decrease in the value of the objective; it is the j which minimizes 
the expression Cj mini ft^/ay | ay > } for the change of the objective. 

The ratio -pricing procedure was suggested informally by Markowitz 
some time ago. It differs from the ordinary procedure only in Phase One. 
Letting Wj be the reduced cost for the infeasibility objective then, and cj 
be the reduced cost for the proper objective, the pivot column j is chosen 
so as to maximize GJ/WJ for Wj < 0; we obtain the largest possible im- 
provement in the proper objective per unit change of infeasibility. It may 
be viewed as an application of parametric linear programming [5]: defining 
0* at each iteration as the largest < such that Cj + 0Wj s:0 for all Wj < 0, 
the pivot column is chosen so as to increase $*. Evidently when <* be- 
comes sufficiently large we have all Wj > 0, and Phase One is ended. It 
turns out that almost all Cj are then nonnegative, too, so that Phase Two is 
quite short. The aim of the procedure is to obtain a first feasible solution 
which is nearly optimal; the data of the Appendix for run 6 show that it 
does this well. 

The symmetric procedure of Talacko [9] is employed only with a full 
basis; it may take either "primal" or "dual" simplex method steps. For 
one iteration: Among those columns with negative reduced costs, and those 
rows whose basic variables are nonnegative, a potential pivot is de- 
termined using the greatest -change procedure as described above; and 
among columns with positive reduced costs and rows with negative 
variables, a potential pivot is determined using the dual of the greatest - 
change procedure (for which the greatest increase of the objective is 
sought). That pivot is used for which the magnitude of the objective change 
is greater. If a step of the first kind is taken, all nonnegative basic 
variables stay nonnegative; if of the second kind, all nonnegative reduced 
costs stay nonnegative. The procedure does not always terminate in a 
solution of the problem [11, p. 10]; but it did for the test problems. 
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Table 6-1 compares all these, taking the ordinary procedure as the 
base (run 21). 

Table 6-1 
ALGORITHMS AND BASES COMPARED WITH ORDINARY, S BASIS 

Problem ID 2A IE 1A 5A 1G IF 2B IB avg. c.v. 

Singleton basis 

PNl-runlO .76 .80 .96 1.00 .73 .66 .84 .82 .82 .1 

PN2-runl4 .83 .82 .98 .93 .76 .73 .66 .90 .83 .1 

Greatest- 
change 
run 15 .70 1.10 1.23 .76 .86 .65 .73 1.21 .91 .2 

Ratio- 
pricing 
run 6 .59 1.62 1.42 .81 .95 .82 .55 1.35 1.01 .4 

Full basis 

Ordinary 
run 39 .63 1.24 .92 .52 1.00 1.21 .45 .76 .76 .83 .3 

Symmetric 
run 40 .43 .82 1.53 .45 .84 .74 .49 .73 .75 .4 

Greatest- 
change 
run 41 .39 .82 .68 .55 .84 .94 .38 .90 .27 .64 .4 

(Note: If problem IB is eliminated from 39 and 41, the averages are 
.84, .69.) 

Of the runs with singleton basis, the positive-normalized procedures 
are outstanding, and over-all the greatest-change procedure with full 
basis is best. Unfortunately the positive-normalized procedures have not 
yet been tried with full bases; they might perform even better. Inciden- 
tally, the data of the Appendix show that, with the natural exception of 
ratio-pricing, the differences among the procedures are reflected in 
Phase One in about the same way as in the entire process. 

The data of Table 6-1 allow the symmetric and greatest -change pro- 
cedures to be compared directly with the ordinary procedure with full 
basis. Using run 39 as base, the averages and coefficients of variation 
obtained are: symmetric algorithm, .92, .3; greatest -change algorithm, 
.78, .3. The relative efficiencies of these procedures are not changed much 
by calculating them from the different base run. 

It is of considerable interest to find some means of predicting the work 
needed for a problem about which little is known. In Table 4-4 it was 
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found that the number M of constraints was the best guide of those 
studied to the number of iterations for Phase One; we shall use it also in 
connection with the total iterations required. Table 6-2 thus lists the 
number of iterations required to solve each of the problems using the 
ordinary algorithm divided by M. It would appear that rule of "2M 
iterations" from folklore is fairly good when a singleton basis is used. 



Table 6-2 
ITERATIONS/CONSTRAINTS FOR ORDINARY, S BASIS (RUN 21) 

Problem ID 2A IE 1A 5A 1G IF 2B IB avg. c.v. 
2.001.67 1.71 1.27 1.09 1.29 1.83 1.18 3.33 1.71 .4 

The corresponding data for the algorithms of Table 6-1 can be found by 
multiplying the entries of Table 6-2 by those of Table 6-1. The averages 
thus obtained appear in Table 6-3. 

Table 6-3 
SUMMARY OF ITERATIONS/CONSTRAINTS 

Algorithm Run Average C.V. 

Singleton basis 

Ordinary 21 1.71 A 

PN1 10 1.24 ,2 

PN2 14 1.24 .2 

Greatest -change 15 1.36 .3 

Ratio pricing 6 1.50 .4 

Full basis 

Ordinary 39 1.39 .4 

Symmetric 40 1.13 .5 

Greatest-change 41 .98 .2 

A more detailed examination of the data seems to show that the de- 
pendence of the number of iterations on M could be better expressed by a 
formula of the form a M b , where b is slightly less than one, but this is 
not clear. Using a singleton basis an estimate of between M and 3M 
iterations will almost always be correct. 
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7. SUBOPTIMIZATION 

Versions of suboptimization have been used for some time in linear pro- 
gramming routines bothered by small core size, but the advantages of a 
version of it for routines for which core size is no particular handicap 
were first exploited by D. M. Smith [9]. As used here, the course of the 
solution of a problem consists of a number of passes, at the beginning of 
each of which some number L of nonbasic columns is selected as a set of 
candidates for pivoting (those having the L minimal reduced costs are 
chosen). During the pass no other nonbasic columns are considered; 
simplex method iterations are performed using the selected columns until 
the objective has been minimized on that subset. (A basic column which 
becomes nonbasic during the pass is not futher considered.) 

The number L of candidates is an important parameter; values of 2, 3, 
5, and 8 were used here. During a pass, any of the various means of 
selecting a pivot column discussed previously might be used in minimizing 
the objective on the candidates. Three were tried here: the ordinary pro- 
cedure of minimal reduced cost; the greatest -change procedure; and the 
procedure PN1. 

Both the number of iterations and the number of passes required to 
solve a problem are of interest. In the table below, the numbers required 
are all compared with the number of iterations used by the ordinary 
simplex method (run 21), which would be the number of passes for any of 
the algorithms for L = 1, Only the averages and coefficients of variation 
are given for these runs; the individual data fluctuate considerably less 
than in most of our experiments. An interesting feature of the raw data 
not reflected in the averages is that the greatest -change procedure com- 
monly requires fewer iterations under suboptimization than does the 
ordinary procedure without it, which is generally not the case for the 
other methods. 

Table 7-1 
SUBOPTIMIZATION RUNS COMPARED TO ORDINARY ALGORITHM 

Iterations Passes 

Run Algorithm L average c.v. average c,v. 

22 ordinary 2 1.15 .2 .72 .2 

27 " 3 1.28 .2 .60 .2 

28 TT 5 1.26 .2 .45 .2 

29 " 8 1.31 .3 .37 .4 

23 greatest -change 2 1.07 .2 .72 .2 

24 " 3 1.08 .2 .59 .2 

25 " 5 1.08 .3 .45 .3 

26 " 8 1.13 .3 .40 .3 

42 PN1 2 1.15 .1 .72 .2 

43 " 3 1.22 .2 .58 .2 
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The term "pass" arises from the fact that it is only necessary to con- 
sult the data for the entire problem once during a pass; the data which have 
to be retained for the subsequent suboptimization are much fewer. This 
fact makes it particularly valuable in product form routines and those 
which use tapes extensively. (The three main forms of the simplex method 
are discussed in Section 8.) The significance of the statistics above de- 
pends on the form of routine used. In the product form, the total work 
done depends largely on the number of passes; in the standard form, on the 
number of iterations; and the explicit form is intermediate. Thus sub- 
optimization is of value in the product form even for all-in-core routines, 
but not in the standard form. Three production linear programming 
routines now use it in the manner described. They are all product form 
routines, one using the ordinary algorithm with L = 2 [1], another the 
greatest -change algorithm with L = 2 [9], and the third has options for 
either algorithm and any L up to 5 [7], 



8. OPERATIONS AND FORMS 

So far we have been concerned only with the number of iterations re- 
quired to solve a problem. A better guide to the computational efficiency 
of a procedure is the number of floating-point arithmetic operations per- 
formedthe work which must be done no matter how the algorithm is im- 
plemented. While logic and bookkeeping time are usually appreciable, and 
vary between different algorithms and different forms of the simplex 
method, it is precisely in such nonarithmetic work that computers and 
programming systems differ the most. Having programmed each of the 
procedures studied here as economically as we could from the stand- 
point of arithmetic, we feel that the results on arithmetic work come 
close to a machine -independent measure of efficiency. 

Although it would be possible to count separately each elementary 
operation, it turns out that there are only three combinations of elementary 
floating-point operations used significantly often in each of the major 
subdivisions of an iteration: addition and multiplication; division and sub- 
traction; and addition alone. Each of the following three groups is thus 
called one operation: 

1 floating add and 1 floating multiply (17.4) 

1 floating divide and 1 floating add (19.4) 

3 floating adds (19.2) 

The average number of 7090 cycles taken by each combination is given in 
parentheses. While some error is made in considering all these equivalent, 
it is very small, because the first combination accounts for almost all of 
the calculations. In all cases (except for a portion of the reverse-trans- 
formation calculation in the product form) an operation is counted only 
when both operands are nonzero. 

There are many ways of calculating the data required for the steps of 
the simplex method. In all of them the data used in the ordinary pro- 
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eedure are obtained, but in different ways. The three main forms of the 
method are described below in outline; the details may be found in the 
literature [5, 6]. In considering the number of operations performed in 
one iteration in any form it is convenient to have a priori estimates in 
terms of M (the number of constraints), N (the current number of 
variables), and M + K (the total number of rows of data). In the formulas 
below, factors of proportionality between zero and one reflect the fact 
that operations involving zero data are not counted; the quantities of order 
smaller than M 2 are disregarded. 

The standard form is done just as the ordinary procedure is described 
in section 3. Pivoting in the tableau is most of the work [requiring 
0!(M+K)(N-M) operations]. 

Both forms of the "revised simplex method" calculate needed items of 
the tableau by multiplying parts of the original matrix A by parts of the 
inverse, the inverse of the (M + K) -order matrix consisting of the basic 
columns of A. The reduced costs are obtained by multiplying A by the 
prices, that row of the inverse corresponding to the objective row of A 
[0 2 (M + K)(N -M) operations]; the selected pivot column of the tableau is 
obtained by multiplying the appropriate column of A by the inverse (the 
number of operations required for this and the remaining steps differs for 
the two forms); the pivot row is selected as usual; and pivoting is done 
both in the inverse and the current solution. 

In the explicit form, or the "revised simplex method with explicit form 
of the inverse," the inverse is a square M + K-order matrix, all of which 
is pivoted in at each iteration. Pivoting requires 04(M + K) 2 operations, 
and the prior multiplication for the pivot column requires 3 (M + K) 2 
operations. 

In the product form, or the "revised simplex method with product form 
of the inverse," the inverse is maintained as a sequence of transforma- 
tions, each of which, having at most M + K nonzero entries, constitutes 
the nontrivial portion of the pivot column of the tableau as of some pre- 
vious iteration. Applied appropriately, these transformations accomplish 
the work of matrix multiplication required by the revised simplex method. 
In a pivot step these data are not altered but are augmented by one more 
transformation. Their total number is generally somewhat less than 
(M + K) 2 , and most, but not all, of them are used once in obtaining the 
prices [0 5 (M + K) 2 operations] and the pivot column [0 6 (M + K) 2 operations]. 
The number of accumulated transformations is periodically reduced by 
"re in versions," the reconstruction of a product -form inverse from A in 
a minimal sequence of pivots. The routines used here reinvert auto- 
matically at those points they determine will minimize the total operation 
count for the calculation. 

In summary, the formulas of Table 8-1 indicate the dependence of the 
number of operations per iteration on problem size. 

It is beyond the scope of this study to discuss the factors of these 
formulas in detail. They will be used instead as guides to the scaling of 
our operation counts. Since for our problems N is closely proportional 
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Table 8-1 
NUMBER OF OPERATIONS PER ITERATION 

Standard form e i (M + K) (N - M) 

Explicit form 2 (M + K) (N - M) + 3 (M + K) 2 + 4 (M + K) 2 

Product form 2 (M + K) (N - M) + 5 (M + K) 2 + 6 (M + K) 2 

to M (N/M ranges from 1.67 to 3.43, averaging 2.31 with coefficient of 
variation 0.27), each of the formulas has, approximately, (M + K) 2 as a 
common factor. Thus comparative data for the three forms can be ob- 
tained as follows: for each problem, divide the total number of operations 
required to solve it by the number of iterations, and divide the result by 
(M + K) 2 . The data of Table 8-2 were obtained in that way; for all forms, 
the ordinary simplex algorithm was used, and an S basis. 

Table 8 -2 
OPERATIONS PER ITERATION/(M + K) 2 

Problem ID 2A IE 1A 5 A 1G IF 2B IB avg. c.v. 

Standard form 

(run 12) .88 3.00 2.11 .44 1.28 .79 .76 .59 1.23 .7 

Explicit form 

(run 21) .71 .95 .77 .49 .54 .31 .38 .29 .67 .57 A 

Product form 

(run 56) .56 1.00 .54 .32 .45 .27 .25 .14 .26 .42 A 

The decrease of the ratios with size of problem is noteworthy; it is 
probably due to the decrease of the proportion of nonzero matrix entries. 
Table 8-3 makes a more direct comparison of these data, using the ex- 
plicit form run as a base. Note that the relative efficiency of the product 
form tends to increase with the size of problem, owing, we think, to its 
greater ability to take advantage of the lower density of nonzeros. 

Table 8 -3 
STANDARD AND PRODUCT COMPARED WITH EXPLICIT FORM 

Problem ID 2A IE 1A 5A 1G IF 2B IB avg. c.v. 

Standard form 1.28 4.67 3.03 .83 2.30 2.57 2.04 2.63 2.42 .4 

Product form .82 1.05 .74 .64 .79 .86 .67 .45 .43 .71 .3 
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9. ALGORITHMS COMPARED BY OPERATIONS 

The algorithms of Section 6 may finally be compared in the total number 
of operations they require to solve a problem. In Table 9-1 they are all 
compared with the ordinary algorithm in explicit form (run 21). With the 
exception of that procedure, each algorithm given has been run in that 
form of the simplex method best suited to it; the ratio-pricing and the 
greatest -change (with F basis) procedures are omitted because they were 
not. 

Table 9-1 
VARIOUS ALGORITHMS COMPARED WITH ORDINARY, EXPLICIT 

Problem ID 2A IE 1A 5 A 1G IF 2B IB avg. c.v. 

Singleton basis 

PN1 standard; .75 2.14 2.42 .67 1.54 1.21 1.32 1.37 1.43 .4 

run 10 

PN2 -standard; 1.07 2.53 2.84 .53 1.79 1.52 1.18 1.82 1.66 .4 

run 14 

Greatest- 
change 
standard; 
run 15 1.15 3.66 3.72 .33 2.37 .99 1.66 2.36 2.03 .6 

Ordinary - 
product 
run 56 .82 1.05 .74 .64 .79 .86 .67 .45 .43 .71 .3 

Full basis 

Symmetric 
standard; 
run 40 .57 4.80 6.04 .47 2.81 2.64 1.01 1.42 2.47 .8 

Ordinary - 
Product 
run 55 .63 1.48 .81 .44 .89 .86 .25 .37 .38 .68 .5 

We think that these figures constitute the best over-all assessment of 
these alternative algorithms from the point of view of calculation needed. 
The product form of the ordinary algorithm seems definitely superior, 
with use of a full basis probably being worthwhile for the larger prob- 
lems. 

We may try to predict the operation count for an unknown problem of 
given size. In Table 9-2, the counts of run 21 have been scaled in a manner 
intended to eliminate most of the influence of the size of the problem. 
Using the factor (M + K) 2 as in Table 8-2 to scale the count per iteration, 
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and the factor M as in Table 6-3 to scale the number of iterations, we ob- 
tain the quotients of Table 9-2. The corresponding quotients for the other 
runs can be obtained by multiplying those of Table 9-1 by these numbers; 
the averages and coefficients of variation for those ratios are given in 
Table 9-3. 

Table 9-2 
OPERATIONS/M(M +K) 2 FOR ORDINARY EXPLICIT 

Problem ID 2A IE 1A 5A 1G IF 2B IB avg. c.v. 
1.42 1.58 1.32 .62 .59 .40 .70 .34 2.22 1.02 .6 

Table 9-3 
OPERATIONS/M(M + K) 2 FOR OTHER RUNS 

Algorithm Form Basis Run Average C.V. 

PN1 standard S 10 1.36 .8 

PN2 standard S 14 1.59 .9 

greatest -change standard S 15 2.04 1.0 

ordinary product S 56 .73 .6 

symmetric standard F 40 2.57 1.2 

ordinary product F 55 .73 .9 

Since in practice K is usually 1, we can say that around M 3 operations 
are required to solve a linear programming problem. A rough minimum 
for problems of no more than some 100 constraints is 0.3M 3 , and 2M 3 is 
a rough maximum for smaller problems. A count of more than 3M 3 
indicates an uncommonly hard problem or a rather poor algorithm. 

10. CONCLUSION 

Three kinds of data have been used above: iterations, operations, and 
passes. We have come to the view that iterations alone is the least in- 
formative: on the one hand, the operation count measures the total work 
of a routine, and on the other passes measure the amount of data handled. 
Of course, except for those of Section 7, suboptimization is not used in any 
of the routines studied, so that in general the number of passes is equal to 
the number of iterations, which is the number we usually cite. 

The results of Section 4 show that use of a full basis will reduce the 
iterations taken in Phase One. (In Section 9, however, we found that it is 
of little value in reducing the operation count for the most efficient pro- 
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cedure,) It appears that there is no excuse for using an entirely artificial 
basis. 

In Section 5 we failed to find any measure of infeasibility with which to 
conduct Phase One which works better than the ordinary measurethe sum 
of all the ^feasibilities. 

The results of Section 6 show the positive -normalized procedures best 
in terms of iteration count, and that the full basis is good for the over- 
all problem. The first conclusion is consistent with the interesting re- 
sults of Kuhn and Quandt [8], who have experimented with several pivot- 
column selection procedures on a large number of randomly generated 
linear programming problems of special type having up to 25 constraints. 
In the only place where their results can be matched with ours, we agree 
in ordering these procedures in increasing effectiveness in iteration 
count: ordinary, greatest -change, and positive -normalized. Our data sug- 
gest M and 3M as bounds for the number of iterations to solve a problem 
starting from a singleton basis. 

The extent to which suboptimization will be of value in a routine depends 
considerably on how its data-handling is organized. Section 7 shows that it 
can be used with little harm and under some circumstances with benefit to 
the total computational labor. 

The comparison of operations per iteration in Section 8 shows pretty 
definitely that the order of the three main forms of the simplex method in 
increasing efficiency is: standard, explicit, product. The fact that those 
algorithms which are better than the ordinary in iterations need data which 
are conveniently obtained only in the standard form makes them less at- 
tractive from the point of view of operation count; Section 9 shows that the 
ordinary algorithm in product form leads the rest. There are other con- 
siderations, however, for general uses of a linear programming routine, 
which are hard to evaluate properly but which argue for the standard 
form: in that form most of the data needed for the usual postoptimal 
analyses reduced costs, etc. are immediately available and need not be 
especially calculated. 

An important fact about the product form, whose detailed study is beyond 
the scope of this report, is that the product-form inverse is extremely 
compact for problems of low density. This fact has considerable bearing 
on the choice of a routine for larger problems. SHARE problem 4A, having 
245 constraints, can be solved with an all-in-core routine [1] for the 
IBM 7090, which has 32,768 words of core. A similar routine using the 
explicit form would require 75,000 words, and using the standard form, 
118,000 words. 

At this time we feel that a product -form routine employing the ordinary 
or the greatest-change algorithm with suboptimization, with option for 
using a full basis, will pull together the best features of the procedures we 
have studied so far. 

It may seem disappointing that our results have not allowed a more 
decisive ordering of the proposals studied. In part, of course, this is 
due to our having selected the more promising possibilites from a larger 
number of candidates; but it may also be the case that, as linear pro- 
gramming is presently understood, it is not possible to do a great deal 
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better than some of these procedures do. A linear programming method 
has two parts: find the optimal basis, and calculate the optimal solution. 
If the optimal basis were known, it would still in the general case require 
some 1/3 M 3 operations to solve the linear equations thus identified (al- 
though a product -form method would do much better for problems, like 
ours, having a low density of data). Since some of our procedures do the 
whole job in about M 3 operations, there does not seem to be an enormous 
amount of room for improvement. 



APPENDIX 
The SCEMP Runs and Data 

Nature of Run 

(Note: These abbreviations are used; for bases: None, Singleton, Full; 
for forms of the simplex method: Standard, Explicit, Product.) 

Starting 

Run basis Form Algorithm 

5 N P Ordinary 

6 S E Ratio -pricing 
8 N E Ratio -pricing 

10 S S Positive-normalized 1 

12 S S Ordinary 

14 S S Positive -normalized 2 

15 S S Greatest change 

21 S E Ordinary 

22 S S Ordinary with suboptimization; L = 2 

27 S S Ordinary with suboptimization; L = 3 

28 S S Ordinary with suboptimization; L = 5 

29 S S Ordinary with suboptimization; L = 8 

23 S S Greatest -change with subopt.; L = 2 

24 S S Greatest -change with subopt.; L = 3 

25 S S Greatest -change with subopt.; L = 5 

26 S S Greatest -change with subopt.; L = 8 

31 S E Sequential Phase One 

32 F E Fudge Phase One 

33 S E Least-infeasibility Phase One 

36 F E Sequential Phase One 

37 F E Least-infeasibility Phase One 

38 F E Extended composite Phase One 

39 F E Ordinary 

40 F S Symmetric 

41 F E Greatest change 

42 S S PN1 with suboptimization; L = 2 

43 S S PN1 with suboptimization; L = 3 

55 F P Ordinary 

56 S P Ordinary 
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(These abbreviations are used: pi, iterations in Phase One; p2, total 
iteration count to solve problem; pa, number of passes to solve problem; 
op, total operation count to solve problem, where "K" stands for "000".) 



Problem 


1A 


IB ID 


IE 


IF 


1G 


2A 


2B 


5A 


Run 


Data 
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p2 


70 
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66 


184 
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33 


32 


75 


66 
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1 
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p2 


34 


32 


75 


66 


51 


81 
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35 


8 


Pi 
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88 
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92 


69 
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46 
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40 


365 37 


88 


113 


92 


99 
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49 


10 


pl 


25 


24 


27 


42 


17 


1 
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42 


41 


51 
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41 


40 


93 


27 
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Parametric Linear Programming 



Robert L Graves 

As it is ordinarily discussed, parametric linear programming is con- 
cerned with two problems. In both of them it is desired to find the solution 
to a linear programming problem as a function of a parameter which 
enters the problem linearly. These problems are discussed in Refs. 1 
and 2. There are two natural extensions which are investigated in 
Refs. 3, 4, and 5. Carpentier and Saaty discuss the problem when the 
parameter enters in a nonlinear manner. Simons gives some character- 
ization of the solution when the parameter is a vector. Here the former 
problem is discussed and an analysis is given for polynomial functions. 
Naturally in this particular situation a more complete characterization of 
the solution can be given than is possible for more general nonlinear 
functions. Neither Ref. 3 nor 4 exhibits a complete constructive solution. 



THE RESULTS IN THE LINEAR CASES 

The situation in the two problems which arise when the parameter enters 
linearly is summed up in two theorems. In the following A denotes an m 
by n matrix (n>nn), c an n-vector, b an m -vector, x an n-vector, u an 
m -vector, and y a scalar. 

Theorem 1: Consider the linear programming problem 

f(y) = max (c + yc t )x 



x >0 

and its dual problem 

f (y) = min bu 

uA - (c + ycj) = d + yd a ^ 

Then the solutions, x and u, and the value, f, can be characterized as 
follows: 

a. There exists a finite connected set (possibly empty) of closed 
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intervals, [yo^Yil (some of which may be points) on which the problem 
has a solution. The set of intervals may include (-^yj and 
[yo>) as well. Outside the set of intervals, the problem has no 
solution. 

b. On each interval the components of x are constants. 

c. On each interval the components of u are linear functions of y. 

d. On each interval f is a linear function of y. 

e. The function f is convex. 

Theorem 2: Consider the linear programming problem 

f(y) = max ex 
Ax - b + yb t 
x >0 

and its dual problem 

f(y) = min (b + yb t ) u 
uA - c = d > 

Then the solutions, x and u, and the value, f, can be characterized as in 
Theorem 1 by exchanging x and u and replacing f by -f . 

Of equal importance is the fact that constructive methods exist which 
allow numerical solutions to be exhibited explicitly. The method which 
accompanies Theorem 1 is a variant of the primal simplex method while 
that for Theorem 2 is conveniently stated as a variant of the dual simplex 
method . 



THE POLYNOMIAL CASE 
The problem considered here is 

f(y) = max (c + yc t + + yS Cg )x = max c(y)x (1) 

Ax = b + yb t + - - + y h b h = b(y) {2) 

x^O (3) 

and its dual 

f (y) = min (b + yb t + - - - + y h b h ) u = min b(y)u (4) 

uA > c + y Cl + + yg Cg = c(y) (5) 

Relation 5 may be written 
uA-c(y) = d(y) > 
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The facts about the solution are contained in Theorem 3. 

Theorem 3: There are solutions, x and u, and a value, f, of the 
problem defined by (l)-(5) which can be characterized as follows: 

a. There exists a finite (but not necessarily connected) set of 
intervals Iy ,yi] (some of which may be points) in which the 
problem has a solution. The set of intervals may include (-^yj 
and fro* 00 ) as well. Outside the set of intervals the problem has no 
solution. 

b. On each interval the components of x are polynomials in y of 
degree at most h. 

c . On each interval the components of u and d are polynomials in 
y of degree at most g. 

d. On each interval f is a polynomial in y of degree at most gh. 

Proof: Let B be an arbitrary m x m submatrix of A. Consider the 
equations 

Bx = b(y) 
uB = c(y) 

where x and c" are m -vectors whose components are defined by ex- 
traction from the relation of B to A. 

If B is non-singular, then the equations have a unique solution and 
clearly the components of x and u are polynomials in y of the desired 
degree. There is, at most, a finite number of half open infinite and 
closed finite intervals (which may be points) in which 

uA >c(y) 
x >0 

are also satisfied since the finite set of polynomials have only a finite 
number of roots. In these intervals, the dual theorem asserts that a 
solution to the linear programming problem exists and the value of f (y) 
is given by the common value of b(y)u and c(y)x. This is a polynomial 
of degree at most gh. 

If B is of rank r < m, either the matrix [B jb(y)] has rank r for 
every y or it has rank r for a discrete set of y or it has rank r for 
no value of y. In the last case, there is no solution, x, associated 
with B. In the first case, there are solutions to Bx = b(y) which are 
polynomials of degree h in y. In the second case there are solutions 
for a set of discrete values of y. If the same analysis is made for the 
matrix [B T |cT(y)], it follows that the pair of equations have solutions of 
the desired polynomial form for every y or they have solutions for a 
discrete set of values of y. In either case, the same conclusions which 
were demonstrated when B is non-singular are true. 
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If there is any solution to the linear programming problem for a given 
value of y, there is a basic solution. Each basic solution arises from 
some matrix B of the sort just described. There are finitely many 
such matrices; hence there is only a finite number of intervals and 
points for which different polynomial representations of the solution 
exist. This concludes the proof. 

A constructive method is probably of greater interest than the proof 
given above. The facts which will be needed in the proof that the con- 
structive method is finite are that the number of submatrices, B, is 
finite and that the polynomial components of x, and uA c(y) = d(y) have 
only a finite number of zeros. Before the algorithm is given a small 
numerical example will be examined. 



An Example 

Consider the problem 

f(y) = max ~4(y-4)(y-9)x 3 



x lf x 2 , x s > 

This problem can be solved by exhibiting all of the bases with their 
tableaus and finding, for each of them, the intervals in which the 
associated solution is optimal. These tableaus are 



1 




1 




1 


1 



d s (y) 



1 


-1 






1 


1 



bi(y) 
bj(y) 



bi(y)~b 2 (y) 
bj(y) 



d-2) 



(1-3) 



1 




1 


-1 


1 





bi(y) 



-b 2 (y)) 



(2-3) 



The formulas for the various polynomials are: 
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= 7(y-5)(y-6) 

b t (y) -b 2 (y) = 10(y-3)(y-8) 
= 4(y-4)(y-9) 



The intervals in which the various bases are optimal are easily obtained. 
Basis Intervals in which solution is optimal 

1-2 [1,4] and [9,10] 

1-3 [8,9] 

2-3 [4,5] and [6,8] 

In this example there are no isolated points which yield optimal 
solutions. Such examples are easy to construct. For let d s (y) = 1, 
b 2 (y) = -(l-y) 2 (3-y) 2 , and b t (y) = 1. Then basis (1-2) yields optimal 
solutions for y = 1 and 3 and the other bases yield no optimal solutions. 
These solutions are degenerate and suggest that special care must be 
taken to handle this particular kind of degeneracy in the constructive 
method now to be given. 

The Algorithm 

The algorithms associated with conventional parametric programming 
are variants of the simplex method and can be paraphrased as follows. 
Suppose that an optimal basic solution is available for y = y . Increase y 
to a value yj where y has the property that the solution is not optimal 
when y = y^ + 6 for > 0. If no such value y$ exists, then the process 
terminates and the current solution is optimal for y ^yo If yi can be 
found, then perform simplex iterations until a solution is found which is 
optimal at y t + for some (small) > 0. The solution is optimal in 
ly^Yil- Then replace y by y A and repeat the process. If no such solution 
can be found for y t + e then the process terminates and there is no 
optimal solution for y > yj. 

The path to be followed here is very much the same. The differences 
are that it is necessary to use both the primal and dual algorithms, it is 
necessary to "jump over" certain intervals, and the cases in which a 
basic solution is optimal in an interval must be distinguished from those 
in which a solution is optimal at an isolated point. 

The algorithm finds (when possible) solutions which are "strongly 
optimal." 

Definition: A function, p, is strongly nonnegative at a point, y , if it 
vanishes identically in some interval containing y or if at y its first 
nonvanishing derivative (including the zeroth derivative) is positive. This 
is written p(y ) +>0. The fact that p(y ) +>0 but p does not vanish 
identically in any interval containing y is denoted by p(y ) + > 0. 

Definition: A solution to the linear programming problem given by 
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(l)-(5) is strongly optimal at y if the components of x(y) and d(y) are 
strongly nonnegative there. 

It is clear that p(y ) +^ implies that p(y) ^ in some interval 
[y , yi ]. Further the relation +> is an order relation which may be 
substituted for the usual one in the simplex algorithms. Actually both 
this relation and the conventional lexicographic ordering are used in 
certain of the simplex steps to follow. 

The principal steps of the algorithm are exhibited in the flow chart 
shown in Fig. 1. The following paragraphs give a detailed commentary on 
the flow chart and a proof that the algorithm it embodies does yield a 
strongly optimal solution when one exists, an optimal solution when one 
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exists, and suitable signals when no optimal solution exists. It is also 
shown that the entire process is finite. 

At the heart of the method are the two versions of the dual simplex 
method (I and III) and the two versions of the primal simplex method 
(II and IV). When any of them is used, a basic solution is available and 
the problem is modified as necessary to insure that only a finite number 
of simplex steps is required. The differences between the methods arise 
from the order relationship used. The descriptions are as follows: 

I. The nonbasic components of d (i.e., the dj associated with non- 
basic variables) are set equal to one. This gives a basic solution 
which is dual feasible. Then the standard dual simplex algorithm 
is employed with lexicographic ordering in an attempt to find a 
feasible solution to the given problem. 
E. When this algorithm is used either a feasible or a strongly 

feasible basic solution is available. The standard primal simplex 
algorithm is used with lexicographic ordering in an attempt to find 
an optimal solution to the given problem. 

III. This algorithm is used when a basic feasible solution is available 
which may be dual infeasible, dual feasible or strongly dual 
feasible. In practice it would be desirable to distinguish these 
cases. It is not logically necessary and, as in I, the nonbasic com- 
ponents of d are set equal to one to insure that the problem is 
dual feasible. Then the dual simplex algorithm is used with the 
usual lexicographic order relationship for the dual variables and 
the strong lexicographic order relationship for the primal 
variables in an attempt to find a strongly feasible solution to the 
given problem. 

IV. This algorithm is used when an optimal and strongly feasible 
basic solution is available. The primal algorithm is used with 
the strong lexicographic order relationship for both primal and 
dual variables in an attempt to find a strongly optimal solution 
to the given problem. 

The over-all strategy is to have algorithms I and E determine the 
isolated points at which optimal solutions exist while using III and IV 
to find the solutions which are (strongly) optimal in nondegenerate 
intervals. We now turn to the other parts of the flow chart whose purpose 
is to determine the intervals. In the boxes in the chart which have the 
label "Find yj," one of the following tasks is to be performed. If the 
current basis is optimal for the current value of y , then it is necessary 
to find a value of yj >y for which the basis is not strongly optimal. If, 
at y , the current basis contains the information which shows that no 
optimal (or strongly optimal) solution exists, then it is necessary to find 
a value of yj >y for which there may be an optimal solution. 

In each of these cases it is necessary to find the smallest root of a set 
of polynomials in some half interval. In conventional parametric pro- 
gramming one finds the smallest root of a set of linear functions in a 
half interval. 
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We now follow the algorithm in detail. (The complete algorithm as 
given examines values of y greater than y . An obvious redefinition of 
strong ordering allows values of y less than y to be treated.) To 
initiate the calculation, a value of y and a basic solution are selected 
and algorithm I is executed. 

If algorithm I terminates with a feasible solution to the given problem, 
then proceed to algorithm II. If it does not, then for some row, r, b r (y ) < 
and all entries in the transformed row of A are nonnegative. Choose y t so 
that y i > y , b r (yi> = and b r (y) < for y <y < y t . There is no feasible 
solution in the interval {y yi) and algorithm I must be used again with 
y t replacing y . If b r (y) < for y > y , then there is no feasible 
solution in {y OJ ). 

The value of "{" is either "[" or "(". It is assigned one of these 
values at several points in the flow chart; initially it has the value "[" 
The intervals in which optimal solutions do not exist are open. However, 
one of these intervals may arise as the union of several subintervals . 
Evidently some of these subintervals must be closed on the left and the 
value of "{" is chosen in such a manner that this assignment of open or 
closed on the left is accomplished. 

Algorithm H terminates either with an optimal solution or with a 
signal that an unbounded solution exists. If an optimal solution is 
found then choose y t as the smallest value of y > y for which either 
bj(y) <+ or dj(y) <+ 0. If yj cannot be determined then the current 
solution is optimal in fro, 00 ) . Otherwise, it is optimal in ty yi] and if 
y < y i it is strongly optimal in [y<),yi). If b^tyi) <+ 0, then replace y by 
y t and proceed to algorithm HI; otherwise replace y by y t and go to 
algorithm IV. If algorithm II does not produce an optimal solution, then 
for some column s, d s (y ) < 0, and all elements in the transformed 
column of A are nonpositive. Choose y t as the smallest value of 
y >y so that either d s (yi) = or bi(yi) <+ for some i. There is no 
optimal solution in {yoyi). If bi(yj) <+ 0, then replace y by yj and 
then go to algorithm III. (Note that it is possible that y t = y ). Otherwise 
replace y by y t and return to algorithm II. (In this case it is necessary 
that yi >yQ.) If yj cannot be so chosen, then there is no optimal solution 
in {yo). 

The choices upon the termination of algorithm El are precisely the 
same as those in algorithm I. Similarly the choices upon the termination 
of algorithm IV are the same as those in algorithm n. The difference be- 
tween II and IV is that when y t is chosen in IV, it is always true that 
Yi > Yo- (& is quite easy to verify that y t = y is impossible because all 
basic solutions which are considered are optimal and strongly feasible.) 

To prove that the entire process is finite it is sufficient to show that 
as successive values of y i are determined, yi = y cannot arise in- 
definitely, since it is known that each subalgorithm is finite and the 
possible values of y t are the zeroes of a finite set of polynomials. The 
cases where y t = y is possible arise upon the completion of algorithm n 
when the path is directed to algorithm III or to algorithm IV. In the latter 
path we reach a point where y i > y . In the former path, we either reach 
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such a point on the completion of algorithm III or we are directed back 
again to algorithm II. Upon emerging from algorithm H this time, we 
either go to algorithm IV or return to algorithm n still again via the 
path which ensures that y^ > y . 

Comments on the Computation 

When the simplex algorithms HI and IV are used it is necessary to 
have the derivatives of components of d and x evaluated at a point, y . 
If the derivatives (at y ) associated with one basis are known then to 
calculate the derivatives (at y ) associated with subsequent bases one 
simply uses the ordinary simplex transformations. The derivatives 
(at y ) associated with a basis and a new value of y are probably most 
easily calculated by multiplying the derivatives of the original quantities 
by the basis inverse. 

To calculate the values of yj it is necessary to find the smallest root 
(in a half interval) of the components of d and x. To do this it is con- 
venient to express the components as polynomials. This is probably most 
easily done by multiplying the original coefficients by the basis inverse. 
Thus we see that two forms are required for the different parts of the 
algorithm. 

Extensions of the Method 

Both theorem 3 and the algorithm can be extended to a wider class of 
functions. Functions which form a finite dimensional vector space and for 
which the range of the derivative operator is contained in the vector space 
are admissible. In order that the algorithm terminate in a finite number of 
steps, only intervals (on the parameter axis) in which the functions of the 
vector space have a finite number of roots can be considered. Thus finite 
Fourier series can be treated over finite intervals which is, of course, no 
restriction. In some situations it is desirable to consider costs or re- 
quirements which are rational functions in intervals in which no singu- 
larities occur. Here one may multiply the relevant set of coefficients 
by the least common multiple of their denominators and transform the 
problem into one in which the coefficients are polynomials. Then it is 
only necessary to divide by this factor to get the results as rational 
functions of the parameter. 

Applications 

One of the most fruitful areas for applications arises in those situations 
where the firm faces a market which is not perfectly elastic; that is, in 
cases where average price depends on the amounts sold or average cost 
depends on the amount purchased. In a linear programming model of a 
firm purchasing raw material in the amount Q, at a unit cost of C, it 
might be the case that Q = C 1 - 2 . To convert the model to the proper form, 
it is necessary to replace Q in the requirements vector by y 6 and C in 
the cost vector by y 5 . 
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This problem can be handled with the usual simplex method by examin- 
ing a number of discrete cases, but the technique given here reveals pre- 
cisely the nature of the solution. A very similar situation prevails in 
problems related to cash budgeting where the amounts of funds available 
as well as the cost coefficients depend on the interest rate. In the first 
example one might trace out the value of the functional and select the 
value of y (and hence of C) which optimizes. The second example merely 
permits a sophisticated sensitivity analysis. 

It is possible to "block out" arbitrary open intervals on the parameter 
axis simply by adding an equation of the form x n+1 = p(y) where p is a 
polynomial which is negative in the blocked out intervals and nonnegative 
elsewhere. 

At the extreme, it is possible to block out everything but isolated 
points. Since the values of the costs and requirements at these points can 
be set arbitrarily by choosing the appropriate polynomials properly, a suc- 
cession of problems with distinct costs and requirements may be solved. 
Needless to say, this approach to discrete programming is not practical 
but it does illustrate the generality of the method. 
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Compufaf/ona/ Efficiency in Product Form IP Codes 



David M. Smith 
William Orchard-Hays 

The superiority of carrying the inverse of the basis in product form for 
linear programming algorithms depends on: 1) having a sparse, packed, 
original matrix, 2) obtaining as small a number of nonzeros as possible 
when reinverting the basis, and 3) using an optimum reinversion fre- 
quency. These considerations are important because, when using the 
product form, recomputation of tableau entries as they are required is 
substituted for the storage and maintenance of a complete, current tableau. 
As demonstrated in the SCEMP tests of the SHARE Standard Test Prob- 
lems [1] the product form (with optimal reinversion) required fewer 
operations than the standard form in all cases where the structural 
matrix was comprised of less than 50 per cent nonzero elements. 

However, when the standard form is used, more data are available when 
the next vector to enter the basis is being chosen. If this information were 
profitably employed, it should be possible to reduce the total number of 
iterations and, consequently, the number of operations required to solve 
a problem. Better digital accuracy would also be maintained. In particular, 
it has long been suggested that the vector chosen to enter the basis 
should make the greatest possible change in the objective function, rather 
than only produce the greatest rate of change. In this volume, an algo- 
rithm [2] has been disclosed for obtaining this choice with little increase 
in the number of operations required in standard form calculations. On 
the other hand, choosing the vector of maximum change when using a 
product form code would require such an increase in operations as to be 
completely impractical. 

Fortunately, a modest step in this direction is applicable to the product 
form; in fact, even at the same number of iterations it may reduce the 
operation count. A version of this technique has been coded, tested, and 
is incorporated in the current SHARE version of LP/90. In this code, the 
usual product form algorithm has been modified to select the two vectors 
producing the greatest rate of change in the objective (hence the name, 
"Double Pricing"), update both vectors, and compute in each case the 
extent of the change in objective. Then an iteration is performed with the 
better of the two. To date, the number of operations per iteration has been 
increased by about 25 percent; however, the vector that was not used is 
transformed by the last step as in the standard form. If its rate of change 
is still desirable, the second vector is also introduced into the basis 
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making a "Double" iteration with almost no additional calculation. Since 
about 60 to 80 per cent of the second vectors were used in the SHARE 
problems, this resulted on the average in only 1.25/1.65 = .76 times the 
usual number of operations per iteration. 

Having two vectors expressed explicitly in terms of the current basis 
has an additional advantage in that it is less expensive to reject a vector 
from consideration on this iteration since its replacement may be on 
hand. The usual reason for rejecting a vector is that its pivot element is 
small enough that digital accuracy might be impaired by the indicated 
change of basis. Under these conditions it is better to select another 
vector, if possible, to enter the basis. 

Another reason for vector rejection is to prevent nonmonotonic be- 
havior of the sum of infeasibilities when using an inverse weighting 
function to drive out infeasibilities. The usual technique in approaching a 
feasible solution has been to select the vector to enter the basis having the 
greatest rate of change in the sum of mfeasibility. Although this choice is 
guaranteed not to increase the total mfeasibility, in many problems fewer 
iterations are required if the criterion for vector selection is the re- 
duction in the number of infeasibilities rather than the amount. Such a 
choice function is obtained by weighting heavily the rows with the smallest 
infeasible value; that is, weighting the rows by the reciprocal of the value. 
Using this procedure, it is possible to select a vector which would increase 
the total amount of mfeasibility. If it entered the basis without removing an 
mfeasibility, cycling would be possible. Using the double pricing procedure, 
such vectors are cheaply rejected. The ratio test for the vector to leave 
the basis proposed by P. Wolfe [3] (which produces the greatest possible 
reduction in infeasibility) is modified to prevent the creation of a new 
mfeasibility. 

When the second vector is used, the reduced number of iterations ex- 
pected as a result of the choice of the first vector is not always achieved. 
The comparison runs examined to date range from a 50 per cent reduction 
to a 90 per cent increase in the total number of iterations required. An 
increase of as much as 25 per cent in the number of iterations could be 
tolerated without increasing over-all solution time because of the com- 
putational efficiency of Double Pricing. The average effect in the SHARE 
test problems was a 10 per cent decrease in the number of iterations. 

These efficiency improvements in the number of iterations and the 
percentage reduction in operations per iteration are independent of, and 
in addition to, control of the build up of nonzero elements in the product 
form of the inverse. The original work of H. Markowitzt[4] in this con- 
nection formed the basis of the two special techniques used in LP/90: an 
optimum reinversion policy, and special pivot choice to reduce the number 
of nonzeros generated. As to the first item, the reinversion point is com- 

tThe complete Markowitz pivot selection technique was implemented 
on the JOHNNIAC in 1955 by one of the authors, but it was so complicated 
that no further attempts have been made to code the complete selection 
procedure. The JOHNNIAC code was limited to 128 rows and the storage 
devices were particularly suitable; it was extremely efficient. 
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puted dynamically so as to maintain the average time per iteration (includ- 
ing the time for reinversion) at a minimum. This is accomplished by 
measuring the elapsed time since the start of the last reinversion on an 
on-line clock and computing the gross average time per iteration. If no 
control were exercised the average would first diminish, then reach a 
minimum, and finally start to increase. When an increase in average time 
greater than 1 per cent is detected, and the number of iterations between 
inversions is within 25 per cent of the last number, reinversion is 
started. With this policy, the time spent in the inversion algorithm was 
about 15 per cent of the total computing time (including the time for 
reinversion) . 

The effect of speeding up the inversion algorithm is two-fold. First, 
the total time spent in the inversion algorithm will be reduced, but only 
by the square root of the speed ratio, since the more efficient algorithm 
will be used with greater frequency. Secondly, the time per iteration 
subsequently is also diminished in proportion to the reduction in average 
density of the product form. The very simple technique derived from the 
Markowitz method for inversion speed is to count the nonzeros in each row of the 
basis to be inverted, and to decrement the counts at each step so as to re- 
main consistent with the counts of the as-yet-untransformed columns. The 
inversion agenda is then: take the next vector hi original order (it is 
usually better to have the sparsest columns first), and choose as pivot that 
admissible row with least nonzero count /f 

Records were kept in the SHARE problems of basic, structural non- 
zeros and product form nonzeros both before and after each inversion. 
They are summarized in Appendix II. The maximum number of nonzeros 
reached was four times the number in the original matrix. Reinversion 
reduced the entries in product form to between 0.5 and 1.7 times the nonzeros 
in the matrix. These densities correspond to 2-3 times that of the actual 
basis inverted. Substituting these values in the approximating formulas 
given in Appendix I for the number of operations gives a reasonably 
close check with the actual operation counts recorded in the SCEMP test 
runs. 

The total running time for the 13 SHARE problems (starting from a 
feasible basis in IB and IV A) was about 35 minutes on the present dis- 
tribution of LP/90 (Version 131) which incorporates all the features 
described in this paper. The time was divided as follows: 

Iterations 25.0 minutes 

Inversions 4.1 

Input & System 3.7 
Output 2.0 

34.8 minutes 



tThis particular adaptation of the Markowitz technique was first pub- 
lished by Zoutendijk [5]. It was then coded for the IBM 7090 by Larsen 
of Esso Research and Engineering [6], based on a design of one of the 
authors. This code was released to C-E-I-R for inclusion in the SHARE 
version and after certain revisions is now incorporated in LP/90. 
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The compute time (iterations and inversions only) on these same prob- 
lems with the first delivery of LP/90 (Version 99) was about 85 minutesf 
[7], but problem IVA was not completed. We estimate that 10 to 15 
minutes more would have been required to reach a solution, so that the 
addition of these efficiency improvements has tripled the average speed of 
the code on these problems. In our other work, one problem has been 
found which ran 10 per cent longer; most problems run in slightly less than 
half the original time. 

APPENDIX 

I. Approximating Formulas for the average number of operations per 
iteration: 

A. Product Form Single Pricing 

Q 

Reverse transformations: - T 

2m 

13 

Obtaining reduced costs: M 

m 

B 

Forward transformations : - T 

2m 

Total operations: P = 5- (T + M) 

m 

where m = number of rows 

B = number of non-slack vectors in basis 

T = number of nonzero transformation elements 

M = number of nonzero matrix elements 

P = number of operations 

In the SHARE problems total operations/iteration computed by the 
formula above ranged from 1.0 to 3.2 times M with the larger, 
sparser, problems having the higher values. 

B. Standard Form (nonzero operations only) 



Total operations: E = mC 



where C = number of nonbasic columns. 
E = number of operations. 



fAll versions of LP/90 operate with double precision arithmetic. An 
intermediate code, Version 103, was distributed to SHARE in October 
1961. This version incorporated all features of the present code except 
Double Pricing, but owing to clock failure we have no accurate times for 
its test runs. We estimate that it lies midway between Versions 99 and 
131 in speed. 



COMPUTATIONAL EFFICIENCY IN PRODUCT FORM LP CODES 215 

In the SHARE problems this formula gave values from 1.1 to 
18.6 times M; again the larger problems had the larger values. 

The spread of ratios of the formula values (Standard : Product) 
was from 1.2 to 3.8 except for the large, sparse 245 row matrix 
(Problem IV-A) whose ratio was 5.8. Except for problems m-A, 
in-B, and IV-A, the actual ratios of operation counts were computed 
from the SCEMP report; these values ranged from 1.05 to 2.9. The 
check between calculated and estimated ratios was considered 
sufficiently accurate considering the many variables not included. 
The trend, as expected, was toward larger ratios in sparse 
problems . 

C. Product Form Double Pricing 

TD 

Reverse transformations : - T 

2m 

D 

Obtaining reduced costs: M 

m 

Q 

Forward transformations : T 

m 

Total operations: D = (3T/2 + M) 

m 

The increase of BT/2m for double over single pricing is esti- 
mated to require only about 25 per cent more operations for a double 
iteration than a single, since certain housekeeping and data trans- 
mission are not changed. 

Thus, total operations per iteration = 1.25/1.65 = 0.76 of the 
operations in single pricing if 65 per cent of the second vectors are 
used. 
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H. Summary of Test Run Results 

Problem IA IB 1C ED IE 

Description 

Rows 34 118 5 27 39 

Columns (Structural) 64 225 9 20 100 

Matrix Nonzeros (Structural) 245 1210 55 232 830 

Average Values 

Non unit Vectors in Basis 31 103 5 25 26 

Nonbasic Columns 33 150 9 20 80 

Basis Nonzeros 135 600 25 150 200 

Eta Nonzeros before INVERT 200 3630 - - 1215 

Eta Nonzeros after INVERT 135 1700 - - 415 



1) 1.4 2.8 - 1.5 1.3 

B 2 
E = (C/M) 3.9 10.6 - 1.8 1.7 

E/P Estimate 2.8 3.8 - 1.2 1.3 

E/PfromSCEMP 1.2 1.5 1.09 1.0 2.6 

LP/90 Iteration to last Opt. 

Original Ver. 99 69 181* 8 56 163 

SHARE Ver. 103 (Oct. 1961) 62 180* 8 28 168 

Modified weighting and row 

choice. 
SHARE Ver. 131 (May 1962) 54 154* 10 53 169 

Double Pricing 

Number of Doubles 18 52* 4 18 50 

Compute time for Solution, minutes 

Original Ver. 99 .47 7.85 .14 .61 3.70 

Double price Ver. 131 .24 4.30 .03 .37 2.36 

*From supplied feasible starting basis. Numbers in parentheses for 
problem IV A were estimated by adding the number of iterations required 
in Ver. 131 (40) to go from the last profit value obtained in Ver. 99 to 
optimal on Ver. 103. 
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II. (Continued) 



IF 



IG 



IIA 



IIB 



IIIA 



IVA 



VA TOTAL 



67 
72 
630 


49 
62 
445 


32 

78 
775 


99 
79 
820 


79 
87 
880 


18 
25 
95 


245 
352 
2125 


34 846 
44 1220 
355 8497 


35 

100 


35 
65 


20 
84 


42 
120 


55 

110 


15 
12 


222 

200 


15 
63 


240 
1900 
570 


248 
900 
540 


170 
1170 
250 


400 
3300 
700 


450 
2000 
650 


40 
90 
50 


1250 
7000 
3400 


100 
826 
140 


1.4 


1.8 


1.2 


1.4 


2.1 


- 


3.2 


1.0 


3.0 


3.6 


1.4 


2.6 


6.4 


- 


18.6 


.96 


2.1 
2.1 


2.0 
2.4 


1.2 
2.3 


1.9 
2.9 


3.0 


- 


5.8 


1.1 
2.5 


102 
116 


86 
82 


167 
172 


118 
115 


183 
166 


22 
22 


(390)* 
(360)* 


37 1586 
37 1516 


62 


74 


156 


126 


135 


25 


309* 


47 1374 


20 


24 


45 


47 


41 


6 


79* 


14 418 


1.50 
.66 


1.36 

.74 


2.85 

2.08 


2.74 
1.50 


3.49 
1.35 


.11 
.09 


(75.00)* 
15.35 


.26 100.0 
.26 29.0 



*See footnote to first part of table. 
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SOME NEW ALGORITHMS FOR LINEAR PROGRAMMING 

M. A. Efroymson 

ABSTRACT 

This paper presents two algorithms which improve the efficiency of the 
simplex method. 

In addition to the conventional artificial vectors, a set of additional arti- 
ficial vectors can be created which are linear combinations of the conven- 
tional artificial vectors and a selected group of real vectors. This set of 
variables has been named Implied Artificial Vectors since they can be 
generated at the time they are required and do not need to be included in 
the original matrix. Implied Artificial Vectors are used as an operational 
device which can markedly decrease the number of iterations required to 
obtain a first feasible solution. Implied Artificial Vectors are used until 
feasibility is obtained to maintain positive right-hand side elements for all 
restriction rows. Therefore, each iteration pivots on a row with a positive 
right-hand side and reduces the amount of infeasibility. 

A vector selection based on maximum change in the objective function 
usually requires fewer iterations than a selection based on most negative 
dj. However, the number of division operations per iteration is increased 
by the use of maximum change in objective function since a minimum 
ratio calculation is made on all vectors with negative dj. When the right- 
hand elements are maintained at a zero or unit level these divisions opera- 
tions can be replaced by a simple comparison operation. A matrix updating 
algorithm has been developed which maintains this condition of unity or 
zero levels on right-hand sides and requires the same number of multipli- 
cation and division operations as the original simplex updating algorithm. 
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A FORMULA FOR RANGING THE COST OF LIVING 

S. N. Afriat 

ABSTRACT 

If a consumer's preference scale S is known, then any cost-of -living 
measurement has a point -determination. But such a scale S can only be 
known empirically to the extent that it is compatible with a scheme ^ of 
expenditure data, necessarily finite, say in respect to some n commodities 
and some k occasions, giving the price and quantity consumed of each 
commodity on each occasion by k pairs of vectors (p r ,x r ) of order n. 
There will be an infinite class Sg: of such scales of the normal type com- 
patible with SF, in which case the data is consistent; otherwise 3: is empty. 
For each S Sg: there is a determination p r s( s ) f r ^ e ratio in which ex- 
penditure on occasion r must be changed to compensate, according to 
preferences in the scale S, for the price -changing from occasion s. With 
S ranging in S$, p rs (S) describes a certain set l rs ($). The problem of the 
cost of living index can be conceived of as the problem of evaluating the 
set Irs^) from the expenditure data F, assumed consistent. It turns out 
that this set is an interval, whose limits p^ s (3 r ), pj. s (?) can be evaluated. 

Let u r = Pr/er where e r = Pr x r anc * ^ et D rs = u r x s "~ * Let W-,$), 
where A = {X r }, $ = {<pr} denote any solution of the system of inequalities 

X r > 0, X r D rs > <p s ~ <p r (r s* s) 
The consistency condition: 

D rs ^ 0, D s t ^ 0, . . . , D qr ^ 

impossible for all cycles of distinct elements r, s, t, . . . , q from 1, . . . , k, 

is necessary and sufficient for the existence of solutions. 

Let 



where 

ot r ^ 0, So: r = 1 
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and let 

x a = Sx r a r , <p a = Zcp r a r . 
Let 

pj; s (A,<l>) =min{u T r x; (x-x t ) r u t At ^ <p B - (pt (t = 1, . . . , k)} 
x 

and 

p rs (A,$) =rnin{u T r x 0! ; ^Q, ^ <^ s } 

Then 

pL(SF) =mmp 1 rs (A,^), pj. s (5 : ) = max p s (A,*) 
A,$ A,* 

It is noted that pi, s (A,$) is the minimum of a linear function subject to a 
system of linear inequalities in which the coefficients are themselves solu- 
tions of a further system of inequalities; and then pL s ( $) i s ^ ne minimum 
of this minimum for all such solutions. 



A Stochastic Model for Programming the Supply of a 

Strategic Material 



Herman Karreman 
INTRODUCTION 

Strategic materials are materials which a) are essential for the proper 
functioning of a country's economy, and b) rely at the same time heavily on 
importation for their acquisition. Iron and copper, for instance, are not 
strategic materials since the second part of the definition does not apply 
to them. But nickel is a strategic material and so is manganese, which is 
needed for the production of steel of good quality. As a matter of fact, 
much of what will be said in the following applies to this latter material. 

Because of the reliance on imports, it will be clear that there is no auto- 
matic guarantee that these strategic materials will always be available at 
reasonable cost. This applies in particular to the case of " limited" war, 
which is understood to be a situation between "cold" war with no overt 
hostilities and "total" war. The Korean war, with hostilities confined to a 
local area and lasting for several years, is the type of "limited" war that 
is here envisaged. In such a situation, it will be difficult to obtain sufficient 
quantities of these strategic materials at reasonable cost by importation 
only. In fact, prices (including transportation costs and insurance pre- 
miums) paid in the last "limited" war period for these imported strategic 
materials were in many cases twice the normal ones. 

To protect themselves against a repetition of this costly affair, coun- 
tries have started to buy, in normal times, extra quantities of these foreign 
ores and to stockpile them. However, the amounts of money involved to 
provide adequate protection this way are enormous. Even the resources of 
the United States have been strained, despite its rich deposits of many es- 
sential materials (iron, copper, etc.) and the large amounts of money ap- 
propriated for the purchase of extra quantities of strategic materials from 



tSecond part of a study made on behalf of the Office of Defense Mobiliza- 
tion under Contract Nonr-1858(02) with the Office of Naval Research. Re- 
production of this paper in whole or in part is permitted for any purpose of 
the United States government. 

$1 am indebted to Harlan Mills for many helpful suggestions, in particu- 
lar with respect to what is said at the end about the final state of the sys- 
tem, and to Stuart Dreyfus for drawing the flow-chart for the first compu- 
tation. 
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foreign countries. The stockpiles of imported ores, built up in the past by 
the U. S. government, will in many cases provide industry with only a frac- 
tion of the extra quantities which it will need in a limited war period. 

There are, of course, in some instances, deposits of these strategic 
materials in the country too. The quality of these domestic ores is on the 
average inferior to that of foreign ores, and they must first be upgraded to 
meet the standards set by industry. In normal times this makes them more 
costly than foreign ores (otherwise, they would have been used in the past). 
However, the technology of upgrading the ores of low quality is steadily 
improving and domestic ores might turn out to be profitable in a future 
period of limited war, when prices of imported minerals will again be high. 

Assuming for a moment that we are on the eve of a limited war which 
will last for several years, the question then is: Given a small stock of 
ore at the beginning of the period, how much has to be imported and how 
much has to be produced domestically each year to meet the requirements 
of a certain strategic material in the following n years of limited war at 
minimum cost? 

To answer this question, a model was constructed in which the various 
ways of meeting the requirements and the restrictions imposed on them 
were formulated. The objective function, being a cost function which had to 
be minimized, contained first order as well as many second order terms. 
This quadratic programming problem was solved twice, once by the simplex 
method, adapted to solve this sort of problem, and once by the gradient 
method.! The result obtained from that model showed that more than half 
of what was needed on top of the initial stockpile would have to come from 
domestic sources. Moreover, the solution appeared to be highly sensitive 
to a reduction in the costs of upgrading domestic ore on account of tech- 
nological improvements . J 

So far, the underlying assumption was that of being on the eve of a 
limited war. This assumption was, of course, rather restrictive, since 
several other political situations are possible. To make the analysis more 
general, a stochastic model has been developed which takes the three pos- 
sibilities of "no war," "cold war/' and "limited war" into consideration. 
The probabilities of transition from one political situation into another 
have been assumed to be those found in the following matrix, P: 



If situation in year n 
is that of 


then probability of occuri 
(n + l). is assumed t 
no war cold war 


*ence in year 
o be for: 
limited war 


no war 


.70 


.20 


.10 


cold war 


.05 


.85 


.10 


limited war 


.10 


.40 


.50 



tThe first method was developed by Philip Wolfe and the second by Ben 
Rosen. 

$A description of the model and the results obtained from it can be 
found inRef. 1. 
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These transition probabilities reflect, of course, certain personal views 
of the political situation at the time they were decided upon. Still, they do 
not seem unreasonable and would perhaps even hold in the present situa- 
tion. 

It should be observed here that this matrix P of transition probabilities 
is not a doubly stochastic matrix, since the columns do not add up to 1 
(the rows do) . The characteristic values and corresponding characteristic 
vectors of P are: 

A.1 = 1 A-2 = -65 A 3 = .40 



2,1 = 



These 3 characteristic vectors are independent so that the T-matrix 
formed by them has a nonzero determinant and an inverse 



3,2 = 




2,3 - 1 

43,3 = ~ 5 



T = 



1 -1 



-5 



, det T = 30, 



_L 20. JL 

30 30 30 

30^ ~30" 30^ 

3^3 30^ 30 



From this it follows that the matrix T -1 P T will have the characteristic 
values as diagonal elements and that liin^ P n & 0. 







5 20 5 


100 




30 30" 30 


.65 


, lim P n = 

n *co 


JL 20 JL 

30 30 30 






JL ?0 _5_ 


.40 




30 30 30 



The meaning of the latter is that if these probabilities were to remain the 
same for an indefinitely long period of time, then the probability of occur- 
rence of these three political situations would be: 



no war 
cold war 
limited war 



30 

20 
30 

_5^ 
30 



or 



1 
4 
1 
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These political situations influence the program in two distinct ways. 
First, the requirements will be different; a much larger quantity will be 
needed in a period of limited war than in one of no war, while the quantity 
needed in a cold war period will lie somewhere in between. 

Secondly, the prices at which the imported ores can be obtained will be 
dependent on the political situation; prices will be high in a period of 
limited war and low in that of no war, with prices in a cold war period 
lying between these two extremes. 

In addition, these import prices are found to depend on the quantities 
of foreign materials which will be purchased by the U.S. This merely 
reflects the important role the U.S. plays as buyer of these foreign ores. 
In other words, these import prices are a function of the quantities to be 
bought, which fact leads to quadratic terms in the objective function, as 
well as linear terms, of course. 

The same holds for the costs of upgrading the domestic ores. They too 
are a function of the quantities which will be upgraded, due to economies 
of scale. However, there is this important distinction, that the quadratic 
terms in the objective function, resulting from the importation of foreign 
ores, have positive coefficients, while those associated with the upgrading 
of domestic ores have negative coefficients. 

This is also true for the costs of processing the foreign and upgraded 
domestic ores into alloys; they too lead to quadratic terms with negative 
coefficients in the objective function in addition to linear terms. The same 
holds for the costs of constructing the plants designed to upgrade the do- 
mestic ores, combined or not with their processing into alloys. 

The technological structure underlying the various activities is an es- 
sential element in the formulation of the problem. The relationships be- 
tween the importation of foreign ores, the upgrading of domestic ores, 
the construction of upgrading plants, the storage of foreign and domestic 
ores, the processing of these ores into alloys, and the constraints imposed 
on each of them found their expression in a model, which will be described 
in more detail in the following section. 

The problem has been solved by the technique of dynamic programming, 
which puts no limitations on the form of the objective function. On the other 
hand, this technique can handle discrete quantities only, so that much de- 
pends on the fineness of the grid. At the same time, the procedure is very 
time-consuming so that the available amount of computer time is crucial. 
In this particular case, a rather coarse grid was all that was feasible, 
given the limited resources. 

The main result is, to a certain extent, a confirmation of the outcome of 
the nonstochastic model, namely that a much larger share of the require- 
ments should be met by domestic production. Even in the case of a "no 
war" situation, the importance of developing domestic resources should 
not be entirely overlooked. 

Another interesting feature brought out by the analysis is that the sys- 
tem has an interesting ergodic property, in the sense that it tends strongly 
to a particular final state, of which more will be said later. 
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The technological structure underlying the model is pictured in the fol- 
lowing diagram. 



c 




final products 



second stage of production 



high-quality ores 



y 7 first stage of production 



storage 



production 



production and 
plant construction 

Fig. 1 



Starting from the top, it can be seen that the final demand for manganese, 
the material for which this study was made, consists of two parts, one for 
ferro-manganese and one for silico -manganese. Each of these two types of 
alloys can be obtained from high-quality ores along conventional lines 
(processes 1 and 3) or from medium -quality ore by special treatment 
(processes 2 and 4). The high-quality ores, in turn, can be obtained by 
importation (process 5) or by upgrading domestic source -material 
(process 6). The quantities acquired this way are added to the stockpile, 
which, in turn, supplies part of the quantities needed in the production of 
alloys. Finally, there are two beneficiation plants in which the low -quality 
ores of domestic origin have to be upgraded before they can be further 
processed. The problem is to find that combination of stockpiling and con- 
struction of plants that will produce at minimum cost the quantities of 
alloys needed in the various possible sequences of political situations. 
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The formulation of this problem led to the model now to be described. 
The notation adopted for it is as much as possible in correspondence with 
that of a preceding article on the same subject, but it was necessary to 
deviate from that notation at certain points. The following symbols have 
been used for: 

a) parameters in price and cost functions a, j3, y, 6 

b) quantities! required I 

c) quantities! to be determined 

for each year x, y 

for a period of years s, t, u 

d) technical coefficients c 

e) transition probabilities p 

f) indices 

individual years i 

summation of years j 
external states in particular 

no war I 

cold war II 

limited war III 

external states in general e, f, g 

internal states in general k, I, m 

individual processes 1, 2, . . . , 7 

summation of processes r 

The quantities (x's) that have to be imported or produced domestically 
can be found in the diagram along the lines leading upwards from the 
circles that represent the corresponding activities. The increases in the 
plant capacities (y's) are located at the outer of two concentric circles. 

Denoting the required quantities of f err o -manganese in year i by 1 j 
and those of silico -manganese by 2 ,i and looking at the diagram, the fol- 
lowing two equalities become evident: 

xi,i + *2,i = 1?i i-1, 2, ..., n (1) 

X3,i + X4,i = 2ji i=i, 2, ..., n (2) 

They simply state that what is required in a particular year also has to 
be produced in that year. In other words, there is no place in this model 
for a shortage of these alloys, since this would have disastrous conse- 
quences. On the other hand, there are also good reasons why no allowance 
for stockpiling alloys has been made either. It should, however, be noted 
that the requirements are now no longer supposed to be known beforehand, 

tin net tons of pure manganese 
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as they were in the nonstochastic model, but depend on the sequence of 
political situations in the future. 

The third equality merely states that what is added to the stockpile in a 
certain year plus what was there at the beginning of that year has to be 
equal to what is taken out of it in that year, plus what is left over at the 
end of that year, which equals the amount at the beginning of the next \ r ear. 

Si + x 5ji +x 6)i =c 1 x lj i + c 3 x 35 i + 81+! i = 1, 2, . .., n (3) 

The letters c t and c 3 denote the quantities of manganese in the form of 
(high-quality) ores that are needed to produce 1 N.T. of manganese in the 
form of ferro- or silico -manganese. It has been assumed that these tech- 
nical coefficients remain the same during the entire period that is covered 
by the program. 

The fourth and fifth relationship assure that the quantity of ore that can 
be upgraded in any year cannot exceed the capacities of the upgrading 
plants at the beginning of that year. These capacities, in turn, are equal 
to the capacities at the beginning of the previous year, plus what has been 
added to them in that previous year: 

x 6 ,i ^t^tej-i+ye,!-! i = l, 2, ...,n (4) 

C 2 x 2 ,i + C 4 x 4ji ^ u 7)i = u 7jM + y 7j u i = 1, 2, . . . , n (5) 

t e ,i and u 7 ,i denote the capacities of the upgrading plants at the beginning 
of year i; y e ,i_i and y 7 ,i_i stand for increases in these capacities in the 
previous year. 

As far as the alloy -producing plants are concerned, it can safely be 
assumed that their capacities will be large enough to process all the ore 
offered to them. Hence, there is no need for another set of constraints. 

The same holds for the quantities of domestic ore to be extracted from 
the various deposits. Here, too, it can be assumed that these deposits con- 
tain sufficient quantities of ore to fill the needs for a good number of years. 

Finally, there is the condition that all x and y variables are not allowed 
to assume negative values. 

x r,i~ r = 1, 2, ..., 6; i = 1, 2, ..., n (6) 

Yr,i ~ r = 6, 7; i =1, 2, ..., n t 7 ) 

In summary, the model consists of 3 equalities and 2 inequalities (be- 
sides the 2 just mentioned) involving a total of 8 activities, of which 6 are 
related to the actual production of manganese and the other 2 to the con- 
struction and expansion of the upgrading plants. 

The objective is to select that combination of the x's and y's, one com- 
bination for every year, that meets the requirements in all successive 
years at minimum -costs. These minimum -costs will depend on the situa- 
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tion at the beginning of the first year. Hence, there will be as many cost- 
minima as there are initial situations possible. 

Each initial situation is characterized by an external and an internal 
state of the system. The external state is the political environment, being 
one of "no war," of "cold war," or of "limited war," the only three pos- 
sibilities considered here. The internal state is determined by the capacity 
of each of the two upgrading plants and the quantity of ore in the stockpile 
at the beginning of the year. 

The activities of a certain year not only have to meet the requirements 
of that year, but also transform the internal state of the system at the be- 
ginning of that year into a generally different one at the end of it. This 
latter state should then be the one that is most favorable from an economic 
point of view for meeting the requirements of future years in the light of 
what can be expected to happen politically. 

Let Me,k,j denote the minimum -costs of a program covering j years, 
counting backwards in time, starting from the last year n, with the e!l 
external and the k5k internal state at the beginning of year j. Then M e> k,j 
can be defined as : 

M e ,k,j = min C e ,k,JU + P E Pe,f M f,,j -1 (8) 



r ni i 

,k,j = min C e ,k,JU + P E Pe,f M f,,j -1 

1 L f=l J 



Here C e> k,jy stands for the cost of meeting the requirements of the J^L 
year (being a function of the particular e^L external state) and transforming 
the kill internal state at the beginning of that year into the $*h internal 
state at the end of it. Hence, C e ,k,i,j is a function of the 8 activities that 
perform this dual function: 



e,k,,j = Z x r,j v r,j + y r ,J w r j (9) 



r=l 



The v r j variables in this expression stand for the prices of imported 
foreign ores, for the unit-cost of upgraded domestic ores, and for the unit- 
cost of alloys. The w r j variables denote the cost of constructing or ex- 
panding the capacities of the plants by one unit per year. 

The price to be paid for a unit of imported ore in any one year i, v 5 ^ 
is first of all a function of x 5j i, the quantity of ore imported in that year: 

V 5,i = 05X5,1 +05 i = 1, 2, ..., n 

It should be remarked at this point that the a -coefficient in this expres- 
sion is positive. This is merely a reflection of the dominant position of the 
United States as a buyer of large quantities of ore in the foreign markets. 
Consequently, the resulting x 5 quadratic terms in the total cost expression 
of imported ores all have positive signs. 

The price of imported ore is moreover dependent on the political situa- 
tion. The reason is that the transportation costs and the insurance pre- 
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miums, both of which are incorporated in the import prices, are greatly 
affected by the political situation. Hence there are actually 3 price func- 
tions for these imported ores, one for each political situation with its own 
-coefficient: 

no war v|,i = a 5 x 5>i +/3 5 Z , i=l, 2, ...,n 

cold war v^ = a 5 x 5?i + p\ l , i = 1, 2, . . . , n 
limited war v"{ = a 5 x 5ji + 0| n , i = 1, 2, . . . , n 

The unit-cost of the upgraded domestic ores as well as that of the 
alloys are also functions of their corresponding quantities: 

v i,i = a i x i,i + 0i i = 1, 2, . , . , n 

V 3,i =^3X34 + 3 > i =1, 2, ..., n 
v 4ji = a 4 x 4 ,i + /3 4 , i = 1, 2, ..., n 
V 6,i = a e X 6,i + P&> i = 1, 2, . . . , n 

Contrary to the a -coefficient in the price -function of imported ore, the 
a's in these last five expressions all have negative signs! In other words, 
the unit-costs will decrease with increases in the produced quantities, due 
to economies of scale. Consequently, the resulting quadratic terms in the 
corresponding x-variables of the total cost-functions all have negative 
signs. 

Mention should also be made here of the fact that the expressions for 
v 2j j and v 4j i stand for the combined cost of upgrading domestic material 
and processing it into alloys. The a - and -coefficients of these two ex- 
pressions have been determined on the basis of the technical coefficients 
c 2 and c 4 . They indicate how many units of manganese in the form of up- 
graded ore are needed to produce one unit of manganese in the form of 
alloys. As in the nonstochastic model, it has been assumed that these tech- 
nical coefficients will remain the same during the period under considera- 
tion. 

The unit-cost for constructing and expanding the upgrading plants in 
year i are: 



W 6,i = 76y6,i +6 6 i =1, 2, ..., n 
W 7,i = r7y7,i + 6 7 i = 1, 2, ..., n 



As in the case of the unit-cost of the upgraded domestic ores and alloys, 
the y- coefficients in these two expressions also have negative signs. Con- 
sequently, the resulting quadratic terms in the corresponding y-variables 
of the total cost-function have negative signs. 
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As for the write-off of these costs, the same depreciation rule has been 
adopted as in the nonstochastic model; the entire cost of construction or 
expansion is written off in 10 equal installments, starting with the year 
following that of construction or expansion. Hence, the total costs are also 
dependent on j, the number of years covered by the program, as long as 
j ^10. 

Turning now to the second part of the right-hand side of equation (8), 
this can be written in a more explicit form: 

in 

PL Pe,f M f,JU-l 
f=l 

= p(Pe,iMi4,j-i + Pe,iiMn,,j_i + Pe.mMin.lj-i) (10) 

The various costs of the program will occur in different years and will 
have to be put on a common basis, i.e., brought forward to the beginning of 
the first year; hence, the discounting factor p. The p e>1 , p e ,n, and p e ,in 
are the probabilities of transition of the particular external state e (which 
is one of the three possible external states considered here) at the begin- 
ning of year j into the first, second, or third of these external states at the 
beginning of the next year, year j - 1 (the years are being counted backwards 
in time). 

M M,j-l M iU,j-l and M nU,j-l denote the minimum -cost of a (j - 1) 
years' program starting from the first (no war), second (cold war), or third 
(limited war) external state and the jtft internal state, at the beginning of 
year j - 1. It should here be observed that the J$L internal state at the be- 
ginning of year j - 1 is the same as that at the end of the preceding year j 
envisaged in expression (9) and implicitly in expression (8). Going back to 
the general expression for these minimum -costs, Mf^j.^, we can write 
for it: 

f m 1 

Mf - 1 = m m n C U>> J -2 + P E Pf ,gMg,m, j -2 (11) 

This formula is similar to (8) given for M e ,k,j with the exception that 
the fth external and lH internal states at the end of year j = the beginning 
of year j - 1 have been assumed to be replaced by the gk external and 
mHL internal states of the end of year j - 1. 

As for the internal states, they are in this model determined by a par- 
ticular amount of manganese in the stockpile (s), a particular capacity of 
the first upgrading plant (t), and a particular capacity of the second up- 
grading plant (u). In other words, there are as many internal states as 
there are permissible levels of the stockpile times capacities of the first 
plant times capacities of the second plant. 

From formula (8) it can be seen that a particular internal state has to 
be created at the end of year j which minimizes the cost of that year, plus 
the costs which can be expected to occur in the remaining years of the 
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program on the basis of the probabilities of transition of external states. 
It should be kept in mind that this particular internal state at the end of 
year j depends on the internal state k at the beginning of that year, on the 
requirements of that year, given the political situation at the beginning of 
that year, and the quantities x and y which are ultimately the decisive 
elements of the program. The ideas in the last two sentences also apply to 
formula (11). How these quantities have been obtained will be discussed in 
the next section. 



COMPUTATIONS AND RESULTS 

The problem as it has been formulated belongs to the class of dynamic 
programming problems and has been solved by a recursive procedure 
based on the so-called "principle of optimality."t A peculiar feature of 
this procedure is that it starts with examining the last year of the program 
first, then the year before last, and so on, so that the first year of the pro- 
gram enters the computation last. The reason for this is that once the 
lowest-cost path has been found between the internal state at the beginning 
and that at the end of a period, it remains the lowest-cost path in all sub- 
sequent computations involving that particular initial internal state. This 
fact permits an enormous saving in the number of paths which have to be 
examined in the search for the least expensive one connecting the state of 
the system at the beginning of one year with that at the end of another year.f 

Still, it is only possible to examine a rather limited number of states 
each year this way and the accuracy of the results depends to a great ex- 
tent on the fineness of the grid. In this particular example, a mesh of 
100,000 N.T. of manganese has been used which is rather coarse. It per- 
mits, however, a rather wide range for the quantity of manganese in the 
stockpile: from to 2,000,000 N.T. In addition, three different capacities 
of each of the two upgrading plants have been taken into account. Conse- 
quently, there are 20 x 3 x 3 = 180 different internal states for each of the 
three external states. Assuming for a moment that all paths connecting 
these internal states are feasible, 3 x ISO 10 paths would have to be exam- 
ined for a program covering a 10-year period. However, by making use of 
the principle of optimality, this number is reduced to 3 x 10 x ISO 2 
= 972,000. Of course, these are upper bounds, since the need of meeting 
the requirements eliminates a number of feasible internal states at the end 
of each year. Still, these figures give some idea of the amount of work 
that is saved by making use of this ''principle of optimality." 

Nevertheless, it will be clear that a high-speed electronic computer of 
the IBM- 704 class is a prerequisite in performing the amount of work that 
remains to be done in applications of this kind. Besides computing the 



tFor a description of this principle, the reader is directed to Refs. 2 

and 3. 

fThe interested reader will find a worked-out example in Ref. 5, 

pp. 18-33. 
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costs of each permissible path, the computer has been requested to print 
out for each internal state at the beginning of a year the cost of the least 
expensive path, the corresponding x and y values, and the resulting inter- 
nal state at the end of that year. The latter provides first of all a check on 
the computations; in addition, it proves to be an important source of infor- 
mation, of which more will be said later. 

From what has been said before, it will be clear that any n-year pro- 
gram will have as many minima as there are different states of the system 
at the beginning of that program. In this particular case, there are 3 dif- 
ferent external states and 180 different internal states associated with each 
one of them. Accordingly, each n-year program has 540 minima and a se- 
lection of 540 xn internal states, since the minimum -path corresponding to 
each one of these minima runs over n internal states in an n-year program. 

The cost -minima of a 10 -year manganese program prove to fluctuate 
between $1.378 billion and $1.986 billion, depending on the state of the sys- 
tem at the beginning of the program.! It is interesting to note that the dif- 
ferences between two neighboring minima in this table indicate how much 
extra has to be paid for or how much is saved by the difference in quantity 
between the two corresponding initial states. The nature of these differences 
is essentially the same as that of the shadowprices in the nonstochastic 
model. 

The results of the nonstochastic model, designed for a 6 -year period of 
limited war, have been compared with those of the stochastic model cover- 
ing the same number of years. As one would expect, the costs of the non- 
stochastic model are higher than those of the stochastic model, since the 
latter also takes into account the possibilities of cold war and no war, 
while the former did not. 

Also, it has been shown how two different sequences of external states 
would affect the minimum cost and the corresponding selection of internal 
states of a 4-year program. The same has been done for the first four 
years of a ten-year program, after which the two outcomes have been com- 
pared. These examples show how one can use the information contained in 
the tables of selected internal states to find the best course of action for 
every future sequence of external states. To put it differently, these tables 
of internal states of minimum -cost programs enable one to make full use 
of the information on the political situation as this becomes available in the 
course of time. 

The tables of selected internal states are, moreover, interesting from 
another point of view. To demonstrate this, a 540 x 540 matrix Q will be 
used, in which each row and each column denotes one of the 540 possible 
states of the system. Figure 2 will make this clear. 

This figure shows that, for instance, in the case of no war, an (1,1,19) 
internal state at the beginning of the first year will result in an (1,1,18) 
internal state at the end of that year. Furthermore, there is a 0.70 chance 
that there will still be a no -war situation at the end of the first year, a 

tThese cost-minima can be found in Table 5, pp. 35-36, and the corre- 
sponding selection of internal states in App. n, pp. 1-27 of Ref. 5. 
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Fig. 2 

0.20 chance that the no-war situation has changed into cold war, and a 
0.10 chance into limited war, according to the earlier -described matrix P 
of transition probabilities. Hence, a value of 0.70 will be assigned to the 
element designated by (J) , a value of 0.20 to element , and a value of 
0.10 to element . It should here be observed that element takes the 
same position in block I as element takes in block E and as element 
takes in block HI. To take another example, in the case of cold war, an 
(0,2,14) internal state at the beginning of the first year will lead to an 
(0,2,9) internal state at the end of that year. Hence, a value of 0.05 at , 
of 0.85 at , and of 0.10 at . This process is continued until the 
changes in all 540 initial states of the system have found their place in 
the Q matrix, each nonzero element having an appropriate probability 
value assigned to it. The final result will be a Q matrix with 3 x 540 
= 1620 nonzero elements which can be partitioned in 3 identical strips of 
540 rows and 180 columns. These nonzero elements indicate, then, how the 
states of the system at the beginning of the first year of a ten-year pro- 
gram will be transformed into probable states of the system at the begin- 
ning of the second year. By probable states is here meant states with a 
certain probability of realization assigned to them. 

A particular initial state of the system will be represented in this con- 
ception by a vector b of length 540 with all elements equal to zero except 
one which will have the value of 1. If this vector b is now multiplied by 
the matrix Q, then the resulting vector b' will indicate how that particular 
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initial state will have been transformed into various states at the beginning 
of the second year with a certain probability of appearance attached to each 
one of these states. 

Now suppose that at the beginning of this second year the policy-maker 
again places himself at the start of a new ten-year program and asks him- 
self how the situation in which he ended the first time will be further trans- 
formed. The answer to this question is given by another multiplication, 
this time of the vector b T times the same matrix Q as before. t The out- 
come of this second multiplication will be a vector b" representing various 
states at the beginning of the second year of the new ten -year program, 
again with a certain probability of appearance attached to each one of these 
states. 

If this process is repeated a number of times, an interesting property 
of the system comes to the fore: That is, the probability that one particu- 
lar state of the system, namely the one with a (2,2,0) internal state, will 
appear becomes larger the more often this process is repeated and ap- 
proaches the value of 1, while the probabilities of the other states grad- 
ually diminish and tend to 0. This means then that the repetition of a ten- 
year program year after year will ultimately lead to a (2,2,0) internal state 
of the system and will then remain in that state. This is what is meant by 
the technical phrase " convergence in policy space." 

This property becomes even more interesting when other initial situa- 
tions are considered as well. Then it turns out that whatever the initial 
situation is, i.e., wherever the value of 1 is located in the original vector 
b, the final state of the system will always be the one in which this partic- 
ular internal state (2,2,0) has a high probability of appearance. This indi- 
cates that the system has an inherent ergodic property, which fact is of 
interest in itself. 

At this point it should be observed that the same result would have been 
obtained if the matrix Q had first been multiplied by itself a number of 
times before the vector x matrix multiplication was done. Computing the 

powers of Q in ascending order will in the end result in a matrix lim Q n 

n ^oo 

which will have only nonzero elements in the (2,2,0) columns and zero- 
elements everywhere else. The elements of the (n.w.; 2,2,0) column of this 
matrix will all have the value of 5/30, those of the (c.w.; 2,2,0) column will 
all have the value of 20/30, and those of the (l.w.; 2,2,0) column will all 
have the value of 5/30. This conjecture is based on the relationship be- 
tween the matrices P and Q on the one hand, and that between lim P n 



and lim Q n on the other hand. It can then be said that a repetition of a 



tBecause of the special feature of this problem, it was possible to per- 
form this second computation on an IBM 650 equipped with index registers 
and able to perform floating -point arithmetic. By making extensive use of 
the table -look-up facility, it took this machine about three minutes to carry 
out one such multiplication. 
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10-year policy year after year will render the system, as it has been de- 
scribed here, into an absorbing Markov chain. t 

At this point it should be remarked that the (2,2,0) internal state of the 
system is the one in. which both plants have their largest admissible capac- 
ity and in which the stockpile does not contain any ore at all. This then 
stresses the importance of the domestic resources and the value that is to 
be attached to the development of the technology by which these low -quality 
domestic ores can be made suitable for the production of alloys. Of course, 
this conclusion is subject to the assumption that the price-cost relation- 
ships between imported and domestically produced ores will not materially 
change in favor of the foreign ores. Moreover, this result is subject to the 
assumption that the probabilities of transition of the three political situa- 
tions will remain the same during the period under consideration. 

fFor a description of the properties of an absorbing Markov chain, the 
reader is directed to Ref. 4, in particular Chapter XV, Section 6, and Chap- 
ter XVI, Sections 1 and 4. 
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source in the model. Because of the constraints on water usage, each 
hour's operation cannot be considered separately so that the problem in- 
volves 16 x 24 = 384 variables. 
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An Application of Linear Programming to the Fairing 

of Ships' Lines 

S. A. Bergcr 
W. C Webster 

1. INTRODUCTION 

When a shipyard contracts to build a ship it is given a small drawing, 
called the "lines plan," which describes the geometrical shape of the 
ship's hull. This drawing consists of three interrelated curves. One set of 
these curves represents the intersection of the hull and a series of horizon- 
tal planes parallel to the keel. These cuts are called "waterlines." A ver- 
tical set perpendicular to the centerplane is called "stations"; and another 
set parallel to the centerplane is called "buttocks." Fig. 1 is the lines plan 
of a typical ship. 

A naval architect has drawn these curves using a spline a thin, pliable 
beam held in place with weights. He insures that these curves are "fair" 
or smooth and pleasing to the eye, because he desires the ship's surface 
to have this property also. It has been felt that ships so designed will not 
suffer from the loss of performance sometimes associated with bumps or 
unfairness of the ship's hull. 

As it stands, this drawing (Fig. 1), about one-fiftieth or one-hundredth 
the size of the full ship, is far too inaccurate for direct use in building a 
ship. The full-scale tolerance of about V 8 -inch cannot be perceived on such 
a small drawing. It then becomes the task of the shipj^ard to expand the 
scale of this plan. Since these curves have no mathematical definition, 
such scaling up is not at all a trivial matter. 

The traditional method of attack for this problem is to measure from 
the lines plan a sufficient number of points to describe the curves. These 
points are laid down, full scale, on a mold loft floor. The loftsman then re- 
constructs the lines plan by drawing fair curves through or as close as 
possible to these scaled points. During this process any reading errors or 
inaccuracies caused by scaling are detected visually and corrected. 

The problem we have undertaken is to perform the same task mathe- 
matically. In itself, this mathematical lofting would not offer any particular 



tThe application of linear programming to curve fitting, as presented 
here, is the result of work performed by Todd Shipyards Corporation, Re- 
search and Development Group, in partial fulfillment of the U.S. Navy De- 
partment's Bureau of Ships Contract NObs-4427, administered by Code 770, 
with the joint sponsorship of the U.S. Maritime Administration. 
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advantages over the previous manual approach. However, with the advent 
of automation, it is imperative that such a mathematical definition exist in 
order to use computer -controlled fabricating machines. 

2. FAIRNESS CRITERIA 

In order to be able to produce a curve which is fair, it is first necessary 
to isolate the properties determining fairness. With current practice as 
our guide, we find that fair curves must at least: 

2-a. Be class C 2 functions. The naval architect insures that the curves 
are smooth by using a spline to draw them. Thin beam theory predicts 
that these curves will be functions whose second derivative is continuous 
everywhere. 

2-b. Have no extraneous inflection points. When a naval architect draws 
the lines plan he arranges the spline and its constraining weights so that 
the curve it assumes is free from bumps. This process is again repeated 
in the mold loft. For a curve given by 2-a above, a bump is the occurrence 
of two closely spaced inflection points. Since the curves of the lines plan 
are drawn by the naval architect and are fair, we must be sure that the 
curves which result from the fitting process have only those inflection 
points indicated on this drawing. 

These two criteria are not sufficient, since it is possible to construct 
curves which do satisfy them and which are not pleasing to the eye. How- 
ever, experience has indicated that the curve which satisfies 2-a and 2-b, 
and which is the best fit to a set of points scaled from a fair curve, is 
satisfactory. Here we define "best fit" as the curve whose maximum de- 
viation from the given points is a minimum . 



3. PRELIMINARY SMOOTHING 

The data one obtains from the lines plan usually consists of offsets, that 
is, the scaled distances of the waterlines and stations from the centerplane. 
These data are prone to certain difficulties. First, the accuracy of the data 
is limited by the measuring method. The offsets are read from a drawing 
with lines of finite width by a scale of finite precision. Second, since there 
are usually at least two hundred points given to describe a ship, it is not at 
all unreasonable to assume that some of them might embody large errors 
due to reading errors, transcribing errors, etc. Thus these data need not 
be exact. There may be a few points that are totally in error and do not 
bear any information. Superimposed on all the rest of the points are small, 
random errors due to the mensuration. It is crucial that these bad points 
be rejected if the naval architect's intentions are to be preserved in the 
curve fitting process. 

Such errors are now detected in the mold loft when the loftsman notices 
that it is impossible to pass a spline through a set of points without pro- 
ducing a bump. A similar procedure can be performed numerically. 



244 



MATHEMATICAL PROGRAMMING 



We shall assume that: 

3-a. The offsets to be examined are, for the most part, reasonable 
points and the bad points are not closely spaced. 

3-b. The set of points sufficiently describe a ship which was intended to 
be fair. Presumably, the points form a matrix of waterlines and stations 
faired by the naval architect. Thus it is reasonable to expect to be able to 
pass fair curves through, or at least very near, the good points representing 
the individual waterlines and stations. 

The strategy for detecting bad points involves determining if these 
points which were scaled from, and which represent, a fair curve are com- 
patible with the fairness criteria. Any incompatibility will be interpreted 
as the existence of a bad point. Since continuity of any order cannot be a 
property of discrete points the question of compatibility becomes: From 
the myriad of class C 2 functions which pass through these points, is it pos- 
sible for one of them to be free of bumps? 

Consider the three points shown in Fig. 2. By the mean value theorem 
there must be some region in the interval from points 1 to 3 in which the 
second derivative of any class C 2 function has the same sign as the second 
difference computed at point 2. For the points shown the second difference 
is negative and there must be at least a region in this interval with negative 
second derivative. 

If there is a fourth point adjoined to these three such that the second dif- 
ference computed at 3 is positive then there must be some region, near 3 in 
the interval 2 to 4 which has a positive second derivative. Thus, in the in- 
terval 1 to 4 the second derivative must change in sign and there must be 
an inflection point. Two consecutive sign changes at three neighboring 
points indicates two inflection points. Since this would be the shortest in- 
terval in which one could predict the existence of two inflection points, i.e., 
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a bump, we must conclude that they contradict our assumptions. It seems 
unreasonable to assume that a naval architect would have intended such a 
set of points since, if he had desired the curvature to change so often he 
would have given more information to describe the shape of the curve in 
this region. 

Once an inconsistancy is discovered it is an easy matter to locate the 
bad point from the pattern of second differences in the neighborhood of this 
difficulty. We have written a program which goes through this process of 
detecting and locating the bad points. This program adjusts the bad points 
so that they become compatible with the other points. 



4. CURVE FORMS 

There are two possible general approaches to the problem of mathe- 
matically fairing a ship's hull. The first method, the analogue of current 
manual lofting practice, is to fair all of the waterlines and stations indi- 
vidually, and since, in general, the set of waterlines will not be compatible 
with the set of stations, to repeat the process in an iterative fashion until 
convergence criteria are met. The second method is to treat the problem 
directly as a surface fairing problem in three dimensions. The advantage 
of the first method as compared to the second is that it reduces the complex 
problem of fairing a three-dimensional surface to the conceptually simpler 
one of fairing a number of closed curves in a plane. However, this advan- 
tage could be partly or completely offset by such difficulties as convergence 
of the iteration and lack of fairness of intermediate waterlines and stations. 
Putting aside such considerations for a while, let us first consider the prob- 
lem of fairing individual waterlines and stations as two-dimensional fairing 
problems. The difficulties indicated above, as well as the direct problem of 
fairing in three dimensions will be considered in later sections of the paper. 
The problem of interpolating between these faired waterlines and stations 
(a difficulty which does not exist in fitting a surface) can be solved by a 
method given recently by Birkhoff and Garabedian [1]. 

The first decision one must make in fairing a set of points in a plane in- 
volves the selection of the equation to be employed. 

In attempting to fit a waterline or station with one analytic expression 
certain difficulties arise; these may be traced to the fact that current loft- 
ing practice (using splines) makes these contours into segmented analytic 
curves, and hence not representable by one analytic function [2]. As indi- 
cated earlier, the current practice is to fair waterlines and stations using 
a spline. Under small deflections splines assume the shape of a segmented 
polynomial of third degree. That is, this curve is a set of cubic equations 
joined in such a way that the resulting curve is of class C 2 . The non- 
analyticity of this curve allows greater freedom in obtaining acceptable 
ship forms. In particular, straight portions being special cubic curves are 
easily included in such a curve, and in addition, and most significantly, one 
can readily control inflection points on this type curve. Here, the second 
derivative varies linearly in the regions between and is continuous at the 
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joins. An inflection point occurs in the interval between two adjacent joins 
if and only if the second derivative at one join is opposite in sign to the 
second derivative at the other. If the joins of the spline curve are taken at 
the points where the original data are given and if one constrains the spline 
curve to have second derivatives of sign matching the second differences at 
these points, then this curve will not have any extraneous inflection points. 
Here pre -smoothing prevents any undesired behavior of the second deriv- 
atives in the neighborhood of these points and the properties of the spline 
curve prevent any difficulties from arising between these points. 

For these reasons it was apparent that the spline curve is eminently 
well suited for the fitting of fair curves on a ship. 



5. LINEAR PROGRAMMING FORMULATION 

The formulation of fitting a fair curve to these points is now quite ob- 
viously a linear programming problem. Suppose we are given n points 
through which we would like to pass a fair curve. The resulting linear 
program is given by: 



-A. - Y(xi> = ~ yi i = 1, . . . , n (2) 

-r r Y"( Xi ) ^0 (3) 

The Eqs. (1) and (2) constrain the spline curve Y(x) to be within A. of 
the given data. The Eq. (3) constrains the second derivative at the ordinates 
of the given data to have the same sign as the second difference rj. Mini- 
mizing \ produces the best fit according to our definition. 

Notice that Eq. (3) is location independent. That is, a curve so fitted is 
still fair even if one translates the curve up or down by any amount. This 
property allows us to draw the very important conclusion that it is impos- 
sible for the process of iterating waterlines and stations to diverge. At 
worst the solution can oscillate. However, our experience has shown that 
indeed the iteration converges and does so quite rapidly. 

In order to solve this linear program one has to represent Y(x) as a 
linear function of positive parameters. We have used the representation 
for the spline curve suggested by Theilheimer [3] which is: 

? ^} 
y = a + a t x + a 2 x 2 + Aj(x - Xj) J (4) 

where: 

<x-Xj)* =(x-xj) 3 ifx > Xj 
(x-xj)*=0 ifx<x j 
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The "plus" notation in the summation permits one to represent this non- 
analytic curve with one equation and a minimum of parameters . In order 
to make Y(x) a function of positive parameters (to satisfy the requirements 
of a linear program), no generality is lost if one lets: 



a = (a r -a"), a', a" ^0 
In these terms, the linear program becomes 

-\ + (a' -a' ') + (al -a! T ) Xi + (a} -a' 2 ')x 2 . + (Aj - Ap^ - Xj ) 3 ^ Yi (5) 



i-i 
-X - (a'o - a'o ) ~ (a! - a" ) Xi - (a T 2 - a" ) X ? - ^ (Aj - AV ) fri - X j) ^ -y^ (6) 



-2r i (a t 2 - a' 2 T ) - 6rt 2J < A j ^ 

5=1 

1=1, ..., n 

Equations (5) - (7) represent a tableau of 
3n equations, 2n + 5 variables 

We have done a great amount of work with this formulation and have 
found the results very satisfactory. However there are some undesirable 
features. First, the tableau is very dense. That is, more than one half of 
the possible elements are nonzero. This leads to storage problems with 
certain codes. Second, there is no immediate feasible solution. This is not 
terribly important but it does first require some time to find a feasible 

solution. _. . ... 

These two problems can be helped somewhat if one adds inequalities (5) 
to (6). This eliminates about half of the nonzero elements but requires the 
addition of one slack per point to maintain the sense of the inequalities. 
This slightly revised formulation increases the linear program tableau to 
(3n + 5) variables. 

It is quite clear that even small curve fitting problems become large 
linear programs. As a result, several steps were taken to improve 
efficiency. 

6. IMPROVEMENTS 

Duality 

As originally noted by Kelly [4], curve fitting tableaus, and this one is 
no e X ception, tend to have more constraints than variables. Thus it is ad- 
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vantageous to use the dual formulation. This is of particular interest since 
it is obvious that the corresponding dual problem has an immediate feasible 
solution due to the simple cost vector in the primal problem. For our 
problems duality thus offers significant advantages. 

Representation 

Notice that Theilheimer's notation is by no means unique. A spline 
curve fitted to n points, of the type previously assumed, has n + 2 degrees 
of freedom. In principle one could choose any (n + 2) independent properties 
of the curve and these will determine the curve. 



Offset Representation 

One set of parameters that appears obvious to consider is the final or- 
dinates after the fairing is complete. Suppose that the curve Y(x) which is 
fitted to the points (x^yi) passes through the points (x^y^) . We can choose 
these yj as (n) of the (n + 2) parameters. Obviously they are independent. 
The additional two necessary parameters chosen were a la slope at the 
initial point x 1? and a 2 , the second derivative at x 4 . 

As before we must assure that all the parameters are positive if a linear 
programming formulation is to be used. Since we are dealing with curves 
which represent a real ship's lines, it is unreasonable to allow any of the 
offsets to be negative. (Because ships are symmetric about the center- 
plane we need fair only the positive half of the hull.) The requirement 
yj > is not only compatible with linear programming requirements but 
also with natural requirements. 

In terms of equations (1) -* (3), we have: 

~\ + y. < y . (8) 

- X -y. <- y . (9) 

-ri Y"(Xi) < (10) 

In this representation, one notices that Eqs. (8) and (9) are considerably 
simpler than Theilheimer's representation, Eqs. (5) and (6). However, with 
Theilheimer's notation it was possible to write out the counterpart to 
Eq. (10), Eq. (7), explicitly. In the above notation this is not as simple to 
do. However one can derive a set of recurrence relations which permit the 
construction of Y M (xj). 

Equations (8) * (10) represent a linear program of (using duality): 

n + 5 equations, 3n variables 

This is a considerable decrease in the size of tableau required. It is 
worth noting that besides being smaller, the tableau formed by (8) (10) 
is less dense than (5) -* (7) . The conclusion is then that the offset notation 
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greatly decreases the tableau size and even more greatly decreases the 
nonzero entries required for a problem. 

Deviation Representation 

Let us consider the case when (n) of the independent variables are the 
deviations, <5p from the given ordinates, y^. Thus: 



The reasons for going to such a notation are not at all obvious. If one 
tries to use these d[ as a basis for a linear program, of the type pre- 
viously formulated, it is immediately clear that two variables o\ and 6" 
are needed at each point in order that the deviation be unrestricted in sign. 
This would, to be sure, undermine the effort to simplify the solution. In- 
stead let us consider a slightly different problem. Let us take the 61 as 
legitimate linear programming variables, where the d are required to be 
equal to or greater than zero. Let us suppose that we subject the spline 
curve to the following constraints: 

-X+6i^O (12) 

-r r Y TT (Xi) ^0 (13) 

Equation (13) is the same as Eq. (3). Equation (12) insures that X is 
equal to or greater than any deviation, 61, since all of the 6j are con- 
strained to be positive. When one minimizes X subject to (12) and (13) one 
determines the spline curve which: 

(a) has the minimum, maximum deviation from the given points, 

(b) lies wholly above the given points. That is, y^ ^ y[. 

Clearly this is not the same solution obtained from equations (1) - (3). 
This solution is denoted by Y(x) and the corresponding values of X and 61 
as X and 6i. _ 

Consider the curve given by translating Y(x) down by X/2. That is: 

Y(x) = Y(x) -A/2 

Several things can be proved about the curve Y(x) . First, the maximum 
deviation of f(x) from the original points is (X/2). This follows immediately 
from the fact that the curve^ Y(x), is wholly above the points but is still 
within X of them. Second, Y(x) is exactly the curve one would obtain if one 
had used Eq. (1) (3). Obviously this curve satisfies Eq. (3). If there were 
another curve with a smaller deviation, X, from the points then this curve, 
this new curve, translated up by X, would contradict (a) and (b) above. This 
not very obvious result means that if we use the deviations from the given 
points as (n) of the independent variables, a linear program can be set up 
that requires only two constraints per point. One subtracts one half of the 
calculated maximum deviation from this curve to get the desired curve. 
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This process requires (using duality) : 
n + 5 equations, 2n variables 

The value of Y"(xi) for this case can, of course, be calculated by the 
same recurrence relations as used in the offset notation. 

Second Derivative Representation 

There is another representation which can match the gains achieved by 
the deviation representation and may have some ultimate advantages. This 
time let us take Ci, the second derivatives at the points xi, as (n) of the 
(n + 2) required variables. For the other two variables let us choose yi 
the faired abscissa at x t and (a} - a") the slope at x t . 

Again we are faced with the problem of the nonnegative requirement. 
However, notice that the sole purpose of (3) is to impose a certain sign, 
that of ri, on Y"(xi). 

Consider a new set of independent variables GI, given by 

Ci = riCi = ri Y"(xi) (14) 

Whenever Cj satisfies Eq. (3), C\ is positive. Thus if Ci are chosen 
as independent variables, they can be taken as linear programming vari- 
ables and as such obviate the necessity of using Eq. (3) for each point. 

It is not necessary to let y t take on negative values for the reasons dis- 
cussed in the development of the offset notation. In this case Eqs. (1) and 
(2) become: 

-A. + Y( Xi ) ^ yi (15) 

-\-Y(xi)^-yi (16) 

Here again it is difficult to express Y(XJ) in explicit form. However, as 
before, recurrence relations can be found from which it is easy to con- 
struct Y(x t ). 

This formulation yields a linear program of (again using duality) : 

n + 4 equations, 2n variables 

This is then equivalent to the reduction afforded by the deviation 
approach. 



7. SURFACE 

As indicated earlier one could treat the problem of mathematically fair- 
ing a ship hull directly as a surface fairing problem in three dimensions. 
Such an approach leads to a much more complex mathematical formulation; 
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it does, however, completely eliminate the need to apply an iterative scheme 
and the consequent problem of convergence is thereby avoided. 

Since, according to current practice, individual waterlines and stations 
are segmented spline curves, the most natural approach in attempting a 
surface fit would be to use the direct analogue of such curves in three di- 
mensions, that is, choose a surface equation with the property that the 
curves of intersection of the surface with two families of mutually perpen- 
dicular planes results in waterlines and stations which are segmented 
spline curves. Such a surface would be analytic in the domain bounded by 
two successive waterlines and stations; at the boundaries the second deriv- 
ative normal to the boundaries must be continuous, along the boundaries of 
each small surface element all the derivatives in the direction of the bound- 

aries are continuous. 

The equation having the properties indicated above can be written 

2 n-1 m-1 

y(x,z) = a ijX izJ + % AI(X - Xi ) 3 + + Bj(z - Zj ) 3 + 
i,j=o i=i 3=1 



i=n-l 



where the symbol ( )+ is as defined earlier. This equation represents a 
segmented cubic surface over the domain 



^ x ^ x 
- z - z 



The continuity requirements are automatically satisfied by Eq. (17). The 
linear programming formulation of this problem then involves requiring 
that the deviation at each point of the x-z grid of given points be bounded by 
A and that the sign of the curvature in the waterline and station planes agree 
with the sign of the corresponding second difference. Apart from the fact 
that y is now also a function of z the only substantial difference between 
fairing in two or three dimensions using linear programming lies in there 
being two curvature constraints in the latter case and only one in the 
former. If r,; and s denote the second differences in the x and z direc- 
tions respectively evaluated at the point (x^zj) then the linear programming 
formulation of the problem is as follows: 
Minimize X subject to the conditions 



< 


i = 1, 2, n 




j =1, 2, ..., m 


< 





(18) 
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where y(x is zj) represents equation (17) evaluated at the point (x^Zj), while 
yy is the given value of y at this point. 

The other representations of splines previously presented can be easily 
extended to give a surface representation. However, these representations, 
although they offer a savings over the Theilheimer-type surface as they do 
with the Theilheimer curve, require a much more complicated procedure 
to develop the matrix given by Eq. (18). 

8. CONCLUSIONS 

Each of the methods presented above has been successfully used in 
fairing ships' lines. In general, it is found that the surface representation 
requires about two orders of magnitude more time before reaching a solu- 
tion than does the iteration procedure. The surface method, however, re- 
sults in a solution with a smaller maximum deviation than one obtained by 
iterating. Although we can not prove convergence, no difficulty has thus 
far arisen in any of our runs using this latter scheme. 

Throughout the discussion in this paper we have neglected mention of 
certain areas of the ship's hull which do not conform to the general state- 
ments made about ships' lines. In particular, there are places where the 
second derivative is not continuous. Also, nothing has been said about the 
ends of curves, where one cannot calculate second differences to be used 
in the linear programming formulation. These difficult areas do require 
special attention, but can be effectively included in an over-all linear pro- 
gramming ship fairing program. 

There is one additional feature of linear programming which enhances 
its suitability for this problem. This is concerned with the ability to add 
additional constraints to the program without changing the basic formula- 
tion. This feature could be particularly useful in introducing such con- 
straints as fixed cargo capacity, given beam, etc. That is, it would be pos- 
sible for the ship user or naval architect to designate certain parameters 
which he wished held fixed in the fairing process; the only limitation, of 
course, would be that these constraints be linear. 

In conclusion, we feel that linear programming finds an ideal application 
in the fairing of ships' lines. 
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The Simulation of Mu/ti-component Distillation t 



E. C De Land 
M. B. Wolf 



A new method is proposed for the simulation of multi component petro- 
leum distillation columns. This method takes advantage of the power of 
mathematical programming techniques for computing the equilibrium 
states of physiochemical processes. The formal procedure was proposed 
and developed for other chemical systems, but it is perfectly general, 
being able to incorporate changes of phase, external sources or sinks of 
mass or energy, and differential equations which describe system dynamics 
if they are relatively slow with respect to the chemical dynamics. 

Using a theorem of the mathematician Gibbs, a chemical equilibrium 
may be defined in terms of the thermodynamic free -energy of each of the 
components. At equilibrium, the sum of the free energies will be minimized. 
In the present paper a free energy (nonlinear) function is defined and then 
minimized under the natural physical (linear) restraints of the system. On 
the analog computer chosen because of the ease of representing the sys- 
tem dynamics, (nonlinear) heat and mass balance equations the solution 
method is by steepest descent. A digital solution has also been devised, 
but not yet implemented because the digital program will be much more 
comprehensive than the basic idea which is presented here. 



1. INTRODUCTION 

Several procedures have been devised for the simulation of particular 
subsystems in a refinery operation, and in particular, since the advent of 
computer technology, practical methods have been developed for modeling 
multistage, multi component distillation on the computer. Amundsen [2], 
Lyster [3], Greenstadt [4], and others have described successful programs 
on the digital machine; Marr [5], Worley [6], Rijnsdorp and Maarleveld [7], 
Computer Systems, Inc. [8], and others discuss simulations on the analog. 
Usually these methods are based upon the equations and techniques devel- 
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oped formerly for hand calculation. Maar [5] is an exception in that he 
proposes a set of partial differential equations for the temperature and 
composition profiles of the column as a whole. 

We propose, here, to apply a basic notion of thermodynamic equilibrium, 
the Gibb's free -energy function, to provide a model for simulation. This 
will require that we take full advantage of the high speed, capacity, and 
flexibility of the modern computer. There are several advantages, which 
we will discuss, to be gained from this method, but principally they arise 
from the fact that the method is perfectly general. It is a natural format 
for representing the subsidiary chemical reactions and states of a complex 
system and for representing classical equilibrium, but also it can be used 
to model irreversible thermodynamic processes (deGroot [16], and others) 
such as elution, ion exchange, and forcing functions (potentials) of various 
kinds. Thus, the method may be used to simulate other elements of the re- 
finery. Here, we illustrate an application to multicomponent distillation. 

The procedure originates in a paper by White, et al. [9], and has been 
applied in biological systems [10], combustion, planet atmospheric studies, 
and others. The present application was suggested in an earlier paper [11]. 
The analog computer results were obtained from research for a master's 
thesis by one of the authors [12]. Details of the analog techniques are con- 
tained in Reference 13. 



2. A FRACTIONAL DISTILLATION COLUMN 

It will not be necessary, here, to describe the fractional distillation col- 
umn in great detail (see Reference 1). The basic idea is that a homogeneous 
input mixture of n components (the Feed) is to be separated into two prin- 
cipal fractions, a condensed vapor phase (the Distillate) and a liquid re- 
mainder (the Bottoms), with reasonable efficiency and control by means of 
a series of m staged distillations. Fig. 1 illustrates a typical distillation 
(a Plate) with communication to the plate above and the plate below. The 
scalar L^ represents the rate of flow of liquid from the k*h plate with 
mole fraction composition vector X^ = (x^, x^, . . xfc n ) and V^, the flow 
of vapor from the k th plate of composition Yk = (yki Yk2* Ykn) Feed, 
at rate F and composition Xf, is entered at plate f and we assume either 
that the feed has the same temperature, pressure, and composition as Xf 
or that it has the same temperature and pressure but a composition which 
would produce Xf and Yf at equilibrium. 

We assume total condensation of the top plate vapor, having composition 
Y m , at rate V m , some of which, the reflux ratio R = LD/LD + D, is re- 
turned to the top plate. Heat at rate Q is entered at plate 1, where also a 
Bottoms product L t = B of composition X i = (x u , x 12 , . . . , xi n ) is withdrawn. 

Heat and mass conservation equations may be written over the column 
as a whole or over any idealized internal section, e.g., a single plate. 
These equations plus the vapor -liquid equilibrium equations completely 
specify the operation of an idealized column when the boundary conditions 
and physical specifications of the tower are given. The composition of the 
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Vapor phase 



Fig. 1. A Bubble -cap Plate 

product B or D will essentially be a function of Q, m, F, X f , and R, but 
actually many parameters affect the product. In addition, important sources 
of error are heat losses, pressure losses due to viscous flow, undesirable 
chemical reactions, and the possibility that equilibrium is not attained on 
each plate . 

A sufficient set of equations for an idealized tower may easily be 
written. For example, an analysis may begin by considering the conserva- 
tion of mass on the bottom plate. For the bottom plate, from Fig. 1, we 
have k = 1, V k-1 = 0, L^ = B, and Q, not shown, is added. Therefore 



(1) 



L 2 = V t + B 

or, for each component j, 

/rt\ 

L 2 x 2 j = Viyij + Bxjj w 

Using Hfc and hk for the enthalpies of the vapor and liquid on plate k, we 
have 

ViHj. + Bhi = L 2 h 2 + Q ( 3 ) 

For the vapor-liquid equilibrium equation we may write 

yij =Kj(T, P) Xl] <4) 

where T and P are the temperature and pressure of plate 1 and Kj is the 
partial -pressure equilibrium constant, a tabulated function. With B, X 4 , 
and Q given there are six unknowns so that two additional equations are 
required. Two equations which prove to be convenient for machine compu- 
tation are derived from the fact that the sum of the mole fractions in a 
given phase is 1: 
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E x 2j(T 2 ) = 1 (5) 

j 

and 

Yij(Ti) = 1 (6) 

j 

Therefore, since x 2 j and yjj are functions of plate temperature, for known 
volume and pressure the temperatures, required for the computation of H, 
h and K, may be obtained implicitly. 

This sample set of equations is appropriately modified for other plates 
of the tower and, in practice, will be supplemented with subsidiary equations 
(relating to heat losses, etc.) as required but which we do not need for the 
present purpose. For the present, from equations 1-6, it is plausible that, 
given B, Xj, and Q, one may compute the initial conditions for plate 2. 
Computing in this manner, matching boundary conditions between each 
plate, D and Y m are eventually determined. Then if Y m does not equal the 
desired composition XQ, either (a) an iterative procedure is instituted to 
correct Y m by varying Q, m, F, Xf, or R or (b) the loop is automatically 
closed, i.e., an error term may be fed back and used to correct the given 
input parameters until the system considered as a whole is in steady state 
and gives the desired output. 



3. PREDICTION OF CHEMICAL EQUILIBRIUM 

Computer methods, devised for the description of complex chemical 
equilibrium, are in terms of the reaction rate equations or the equilibrium 
constant algebraic equations or finally in terms of the thermodynamic free 
energy of the equilibrium condition which may be obtained by mathematical 
programming. At equilibrium, of course, all methods give essentially the 
same information, but in addition the last method has a standard format 
which is more flexible, yields additional data on the enthalpy of each 
species, and can incorporate the so-called irreversible, time -invariant 
processes. 

For mixtures of a single phase equilibrium, conditions may be pre- 
dicted, as described in previous papers [11, 14], by minimizing the (non- 
linear) Gibb's free -energy function 

F(Y) = RT } nyj [ Cj + In Vj ] (7) 

where 

Y = (yi Y2 y n ) the set of mole fraction numbers, 

c j = (Fj/RT) + In P, standard free energy per mole of the j th species, 
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n = SjHj, total number of moles of all species, 
yj = mole fractions of jth species, 

under the (linear) conditions of conservation of mass and that yj > for 
each j. The right side of Eq. (7) is simply the sum of the Gibb's free 
energy of each species. For mixtures of two phases, one phase may be re- 
garded as the " standard state" and the other phase may be computed with 
respect to this state. Thus, if F?/RT = in vapor phase, and using z for 
either x or y, 

F(Z) = RTnj yj [0 + In P + In y] ] + RTn 2 Xj [C j + In Xj] (8) 

where 

Cj = (AF?/RT) + In P = AF? /RT, liquid phase, 

AF? = change in standard free energy per mole for the j th species, 

u is n 2 = total moles in liquid and vapor phase. 

For vapor-liquid equilibria, In P = total pressure in atmospheres for the 
vapor phase, = for the liquid phase. Alternately, Eq. (8) may be regarded 
as the statement of a chemical reaction in either phase, in which case Cj 
becomes the free energy of formation of the product species. For the 
present distillation: 



Thus, for the distillation column, we replace equation (4) by equation (8) 
and minimize (8) under the restrictions or conditions 

Xj s: 0, y< 2: o, the output species are not negative (10) 

and conservation of mass 

N i = n ii + n 2i = tota l m l es of i tl1 input species 

in both phases on a plate (11) 

However, since the equilibrium concentrations are independent of the 
amount of the total mixture, we may either assume the total moles in 
either phase equals a constant, or, from Pv = nRT and the dimensions of 
the tower, compute the actual NI- Assuming n t = 1 and n 2 = 1, we may re- 
place equation (11) with 

</>i = NI - xi - yj = 0, for each i (12) 

Chemical reactions may be incorporated and the resulting stoichio- 
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metric restrictions [including Eq. (12)] may be organized into a matrix 
format by writing, instead of (12) 



=0 (13) 



where the a^j are formula numbers indicating the atoms of species i in 
product j. For ideal vapor -liquid equilibrium, all a^ = 1 and the matrix is 
diagonal for each phase (z represents either x or y) . 

For computational purposes we incorporate the restrictions (12) or (13) 
into (8) using Lagrange multipliers and find the min-max of 



G(Z, ir) = F(Z) - } Tr^i (14) 



under (10), which clearly has the same minimum with respect to z as the 
original problem. At equilibrium the vectors Z and TT satisfy (14) and give 
respectively the moles (or mole fraction) of each species and the free 
energy contribution of each species present. To see this latter definition 
of the components of the vector TT, we may consider the first partial deriv- 
atives of G, which arise for purposes of computation by the method of 
steepest descent. 

To compute, either on the analog or the digital computer, we may write 

r\/- __ 

= Cj + In Zj - SajjTTj = 0, for each j 

rv/) 

= <i(z) = 0, for each i (15) 

and Zj > 0, all j 

and require that these partials be satisfied for all i and j. The first equa- 
tion of (15) clearly defines the 74 as in the above paragraph, the second 
is the conservation of mass. Detailed procedures for computing a chemical 
equilibrium on the digital machine are given in Ref . [9], an example in Ref . 
[10]. The analog procedure is given inRefs. [11], [12], and [13]. But, gen- 
erally speaking, we can satisfy zero sum equations as in (15) by implicit 
computation. In the first two equations of (15) we begin by defining the un- 
knowns TT and Z arbitrarily and arrange matters so that if they are in 
error, a negative feedback signal forces the system to correct their values. 
A certain amount of analysis is required for convergence and stability, but 
the idea is not new. Kose [15], in 1956, demonstrated the conditions for 
convergence. 
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4. DISCUSSION, AN APPLICATION, AND CONCLUSIONS 

Reduced to its essential content the suggestion of this paper is to re- 
place the computation represented by Eq. (4), Sec. 2 by the procedure of 
Sec. 3. But in practice, the remaining equations of Sec. 2 will be altered 
and the amount of computation reduced. However, in a feasibility study, 
Eq. (4) was replaced by Eqs. (15) and a problem in the stabilization of 
natural gasoline was simulated on an analog computer [13]. The feed con- 
tained six components and under the conditions of the experiment, the 
desired separation was attained with six plates. 

Computationally some of the detailed changes involved are: 

(a) The Cj = In Kj are no longer functions of two variables since the 
ambient pressure is introduced as an additive term in Eqs. (8) and (14) 

(b) Although this is an advantage only on the analog computer, the vari- 
able multiplications of Eq. (4) have been replaced by log function genera- 
tion. 

(c) On the analog, more amplifiers were required than we presume to 
be the case forEq. (4). 

(d) The computation time for a new equilibrium is usually very short, 

a few milliseconds, hence the computed equilibrium is responsive to a con- 
tinuously changing parameter, and would continuously follow. 

(e) In the usual procedure, Eq. (6) implicitly determines the vapor 
temperature along with Eq. (4) from the fact that the total pressure is 
equal to the sum of the partial pressures. 



The temperature is changed until Eq. (16) is satisfied. This procedure 
is still possible and was used in the example problem; however it may be 
more convenient to have n and T fixed and determine P as in (12). The 
T may be computed from the enthalpy and mass flow rate. 

(f ) Chemical reactions may be incorporated with no changes in format; 
the matrix of coefficients, a^, becomes nondiagonal. 

(g) Following classical procedures, unequal chemical potentials across 
phase boundaries or membranes may be simulated by incrementing 
(AF/RT) for the affected species. The (AF?/RT) may thus depend upon 
variables other than temperature and pressure, for example, flow rate, 
concentration (activity), or electrochemical potential across phase bounda- 
ries or membranes. Each of these phenomena have been simulated for 
time -invariant, steady-state systems where the activities or potential are 
assumed to be parameters. 

(h) For these cases, either the analog or digital solution methods have 
been found to give stable solutions with good precision. 

More generally, this procedure separates the equilibrium computation 
from the mass flow. With respect to the temperature and pressure profiles 
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of a tower, one could regard the tower as a sequence of equilibria com- 
pletely determined at each plate with the over -all heat and mass flow boun- 
dary conditions, B, R, F, Xf, and Q, as forcing functions. That is, even 
though, for a fixed distillate composition, these functions form a dependent 
set; they do determine the pressure and temperature profiles as monotone 
"step-like" functions. As such, the column may be regarded as a thermo- 
dynamically bounded system open to the environment, the interior deter- 
mined by the boundary conditions. Then the usual procedures of partial 
differential equations, e.g., relaxation, may be indicated, where the nodes 
are determined by equilibrium computation. Practically, this simulation 
will be much enhanced if the pressure profile can be assumed. It is still a 
conjecture that there will be a unique solution if both the pressure and tem- 
perature profiles must be computed, although the conjecture is reasonably 
founded on the simpler case of constant pressure. 

Finally, the natural extension of the steady-state computation procedure 
of Sec. 3 is being considered. This procedure will include all m plates of 
the tower in a single conceptual format of restriction equations applying to 
a single Gibb's free-energy function for the entire tower. The alternative 
was to have a free -energy function and restriction for each plate, the 
plates being linked together either by iteration from plate to plate or by a 
method from partial difference equations as above. This conceptual format 
has not been implemented on the digital computer, but would be constructed 
by formal compartmentalization of the matrix ay in a manner similar to 
Ref. [10], a compartment for each plate. The difficult analytic problems 
are to show that this nonlinear mathematical programming problem is suf- 
ficiently determined and that it will converge. 

Although a distillation column may ideally be time -invariant, actually it 
probably is not. Oscillatory states and transients as well as nonideal 
plate conditions are considered more likely. It would be curious to use 
this approach to analyze the system as a whole for its intrinsic natural 
frequencies and transient response as in Ref. [17]. Also, it would be in- 
teresting to know whether the present procedure involving a minimization 
could be incorporated into the linear programming routine for product dis- 
tribution throughout the refinery. 
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OPTIMAL CAPACITY SCHEDULING 



Arthur P. Veinott, Jr. and Harvey M. Wagner 

ABSTRACT 

The purpose of this paper is twofold: (1) to exhibit simple and efficient 
algorithms for solving a particular class of optimization problems, and (2) 
to demonstrate the wide applicability of this class, which includes, as sig- 
nificant models, capacity scheduling, equipment replacement and overhaul, 
labor force planning, and multi -commodity warehouse decisions. Of some 
importance is the fact that not only do our algorithms assist in solving 
generalized versions of these models, but in many cases, such as equip- 
ment replacement, they actually improve on computational schemes hereto- 
fore proposed for simplified versions. We have tried to avoid confusion 
that would be engendered by simultaneously referring to several of the 
models, by keeping our exposition in terms of one particular problem, 
capacity scheduling; we do, however, turn attention to the other inter- 
pretations of the model. The specific capacity scheduling problem is de- 
scribed as follows: a decision maker must contract for warehousing ca- 
pacity over n time periods, the minimal capacity requirement for each 
period being deterministically specified. His economic problem arises 
because savings may possibly accrue by his undertaking long-term leasing 
or contracting at favorable periods of time, even though such commitments 
may necessitate leaving some of the capacity idle during several periods. 
Clearly this programming model might also apply to other types of capacity, 
such as transport facilities, insurance protection, and leased telephone 
lines. 
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THE PERSONNEL ASSIGNMENT PROBLEM 

David J. Fitch 

ABSTRACT 

The personnel assignment problem was formulated and in a sense solved 
by Brogden in his 1946 Psychometrik paper. He stated the problem as one 
of devising a procedure "for maximizing efficiency of selection and assign- 
ment when each individual may be eligible for several assignments." This 
means assigning men in such a way as to both maximize the sum of the 
expected contributions and to meet the required quotas. 

The paper (1) points out the fact that the Army has a large problem in 
assigning men and has substantial information which could help in making 
assignment decisions, and that the absence of a computerized model which 
can handle the rather complex situation means that decisions made are 
poorer than are necessary, (2) traces the history of the problem, (3) com- 
pares the Brogden approach where each iteration is optimal and conver- 
gence is toward a feasible solution with the simplex procedure where each 
iteration yields a feasible solution and convergence is toward one which is 
optimal, (4) outlines Dwyer's contribution for solving for the job constants 
needed in the Brogden solution, and (5) describes a program we have 
written and which is running on the IBM 1401 for assigning up to 10,000 
men with as many as 100 jobs and which is able to solve this size problem 
in a very reasonable length of time. 
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An Algorithm for Infeger Solutions to Linear Programs 1 " 



Ralph E. Gomory 
INTRODUCTION 

This report describes a method based on G. B. Dantzig's simplex 
algorithm for solving linear programming problems in integers. This 
method has been outlined before in [3] and [3a] and is closely related to 
previous work by Dantzig, Fulkerson, and Johnson [1] and Markowitz and 
Manne [2]. 

A general description of the method is given in Section 1. In Section 2 
the main class of inequalities used in the method is derived and shown to 
form a group. Section 3 gives a geometrical interpretation of the in- 
equalities. In Section 4 some properties of the inequality group are de- 
rived. Section 5 discusses briefly ways of choosing particularly effective 
inequalities. In Section 6 a variant of the basic inequalities is discussed. 
Section 7 contains a description of the lexicographical dual simplex method 
used in the finiteness proofs. Section 8 gives two versions of the method 
and shows that they obtain the integer answer in a finite number of steps. 
Section 9 contains miscellaneous comments including remarks on possible 
extensions, programming experience, etc. Section 10 contains a summary 
of the procedure and small worked out problems illustrating some of the 
results of the preceding sections. The later sections depend only on 
Sections 1 and 2, with an occasional reference to the beginning of 4. 

The idea of adding inequalities to a linear programming problem to 
progress toward an integer solution has already been used in [1] and [2]. 
Here we show how to add such inequalities automatically, and prove that 
by use of a certain class of inequalities the integer solution is actually 
attained. 

The notation and general approach to the simplex method used 
throughout is that of A. W. Tucker. Both A. W. Tucker and E. M. L. 
Beale have contributed many valuable suggestions. 

1. GENERAL DESCRIPTION 

The integer programming problem is the problem of finding non- 
negative integers Xj maximizing 

tThis work appeared originally as Princeton-IBM Mathematics Research 
Project Technical Report Number 1, November 17, 1958. 
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j=n 

z = ao, + Z aoJ (< ~ X 3 ) 
j=i 

subject to the conditions 

J 

a 



The inequalities above can be replaced by equations involving additional 
nonnegative "slack" variables x if and, for purposes of exposition, the 
whole problem will be enlarged by the addition of a series of trivial re- 
lations Xj = -l(-Xj) (unit rows). This way all the variables involved are 
expressed in terms of the independent (or nonbasic) variables appearing 
on the right in the enlarged set 



z - ^,0 + E 



3-n 

E ai,j(-Xj) i = l. ....m 

3 = 1 



X S = 



s = l, ..., n (1-D 



Here the 6 s j (6 s ,j = 1 if s =j, 6 s j = otherwise) indicate the unit rows. 

Whenever it is convenient to have a complete set of unit rows, i.e., a 
set with a -1 in every column including the zero-column, we can also 
adjoin the trivial equation -1 = -1(1). 

Rewriting (1-1) in matrix form gives 







with X the m + n + 1 vector with z as first component representing all 
the variables, T the n + 1 vector with 1 as first component and other 
components -tj representing the variables on the right in Eq. (1-1), and 
A the matrix of all constants appearing on the right in (1-1). 

To solve the ordinary linear programming problem one applies George 
Dantzig's simple algorithm (or the dual algorithm) to Eq. (1-2) and by a 
series of pivot steps (which are equivalent to choosing different sets of 
nonbasic variables) produces a series of new equations 



in which T^ represents the variables that are nonbasic after the k^ 1 
pivot step, and the A k are transformed into their successors by right 
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multiplication by non-singular matrices. (The dual simplex method in this 
form is described in more detail in Section 7.) 

The solution to the ordinary linear programming problem is obtained 
when an A k is obtained with the special properties 

ao.j ^0 j = 1, . .., n 

ai, >0 I = 1, .... n +m (1-3) 

the solution then being X = a with ce the first column or column of 
constants in A k . 

We can now outline an algorithm for obtaining the solution to the integer 
programming problem. 

The initial matrix A (we will assume here that it is a matrix of 
integers) is transformed by the simplex method into the form (1-3). If 
the solution X is not in integers a new equation or equations (each rep- 
resenting a new inequality and its slack variable) is added to the set A^. 
It will be shown that this inequality is satisfied by any nonnegative integer 
solution to (1-2), so that its addition does not eliminate any nonnegative 
integer solution to the original problem. Each additional equation intro- 
duces a single negative element into the zero column so the enlarged 
matrix is not in the form (1-3). The (dual) simplex method is then applied 
to the new matrix to bring it back to the desired form. If the new solution 
is still non- integer the process is repeated. During this process a new 
row can be dropped as soon as its slack variable becomes strictly pos- 
itive. It is shown that in a finite number of steps a final matrix A^ r is 
obtained with the following properties 

(i) it is of the form (1-3), 
(ii) all entries are integers, 
(iii) it contains n or less additional rows, all unit rows. 

If we disregard the trivial equations represented by the additional unit 
rows we have the equations represented by the first n + m + 1 rows 

X = AT 

where T, the vector of the current nonbasic variables, can include some 
of the new slack variables, and X (since the extra rows have been dropped) 
is simply the vector of original variables. Just as in the standard simplex 
method, a solution is now obtained by setting all variables in T equal to 
zero. Because A is an integer matrix and in form (1-3) the solution so 
obtained is integer, nonnegative, and maximizes z. Thus it is the solution 
to the integer programming^problem. 

We will also show that A is a unimodular transform of A and that 
if the (readily available) inverse of this transformation is applied to the 
extra rows of the final matrix, the resulting rows represent the set of 
additional inequalities expressed as all-integer inequalities (or equations) 
in the original variables . 
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2. THE BASIC INEQUALITIES 

We will now proceed to show that each matrix A k has implicit in it a 
class of additional inequalities satisfied by any nonnegative integer 
solution to the problem. 

In terms of X and T k the equations of A k are 

X = A k T k (2-1) 

However, any nonnegative integer solution X', also satisfies the re- 
lation 

X'sO (modulo 1) (2-2) 

where we say that two numbers are equivalent (=) modulo 1 if they differ 
by an integer. 

From (2-1) and (2-2) we have for T T , which consists of the nonbasic 
variables from the integer solution X T , 

= A k T f (modulo 1). (2-3) 

This gives a set of equations which T f must satisfy if it is to produce 
an integer X r ; however, these are not the only equations that must be 
satisfied by T T . Any integer multiple of an equation of (2-3) produces 
another equation satisfied by T T , and so does any sum of the equations of 
(2-3). If we regard any equation as being given by its row vector, we find 
a whole class of equations satisfied by T ? . This class is the module M 
generated by the rows of A k over the integers. 

From some of these equations (rows) we will deduce new inequalities. 

Suppose that an equation of M 



=^ + a j (~*j) (2-4) 

has the property that aj > for all j > 1. Then from this equation an in- 
equality can be deduced. 
Rewriting (2-4) we have 

j=n 

*o = ^ tj 
i=i 

Now the right hand side of this equation is nonnegative since everything 
appearing there is nonnegative. It is also equivalent to the left hand side 
which can be represented as the sum of an integer n and a nonnegative 
fractional part f . Since the right side is nonnegative and differs from the 
left by an integer, it must be either f 0> l + f , 2 + f , etc. Consequently 
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J= 

fo^E a j*j (2-5) 

3=1 

This inequality can be expressed in the form of an equation by introducing 
the slack s 

J=n 

s=-f - aj (-tj) (2-6) 

j=l 

which is the difference of the two sides of (2-5). The inequality is then 
expressed by the requirement that s be nonnegative. Since s is the dif- 
ference of two equivalent numbers [the two sides of (2-5)] s is also an 
integer. 

Equation (2-6) then is a new equation in nonnegative integers which must 
be satisfied by the T T of any integer solution and represents a new in- 
equality. If f *0, i.e., ao is not an integer, the T k of the present trial 
solution, tj =0, 1 <j <n,does not satisfy the new inequality, for these tj 
values, substituted in (2-6), give s a negative value -f . Equation (2-6) 
then represents a new equation which could be added to the equations of 
A k . Since s is required to be a nonnegative integer, the new problem is 
still a problem hi nonnegative integers. 

Since there are many (in fact a countable infinity) of equations in M 
satisfying the condition aj ^ 0, 1 < j <n, a whole family of new inequalities 
could be deduced by this reasoning and applied. However we will be able 
to do something better. We will replace this large family by a smaller 
(finite) family F of inequalities. The inequalities of F will be generated 
from A k in a very simple way, and every inequality of the larger family 
will be implied by some inequality of F. 

To do this we consider the effect of decreasing by integer amounts the 
aj, j > 1, that appear in (2-4). First of all, changing the aj by integer 
amounts does produce equations satisfied by T T since this change can be 
accomplished simply by adding or subtracting the appropriate unit rows of 
A k . (This change is also justifiable directly. Since the tj are integers, 
this change changes the right hand side only by an integer.) Secondly, any 
decrease in the aj which leaves them still nonnegative results in a new 
inequality. This inequality is just like (2-5) only it involves the new 
smaller coefficients. It is easily seen that this new inequality is stronger 
and in fact implies (2-5). Any T T satisfying the new inequality auto- 
matically satisfies (2-5). 

The strongest possible inequality obtainable from (2-4) by this process 
of coefficient reduction is 



where the fj are the fractional parts of the aj, each aj being represented 
as an integer plus some nonnegative fractional part fj < 1. We will call 
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an inequality like (2-7), in which the coefficients cannot be reduced any 
more, a reduced inequality. 

As before we can represent this inequality by an equation 

J=n 

s=-f - f j (-ty 
3=1 

and we will let the row vector of fractional parts 
(f , fj, f 2 , . . . , f n ) 

stand for either (2-7) or for the equation form of the inequality. 

This leads us to consider the mapping J which sends any row vector 
of M into the row vector of its fractional parts. 

We assert that this sends an equation satisfied by T y into an inequality 
satisfied by T T . 

To see this take any equation of M 



= ao + a j (~tj) 
j=i 

and, if there are negative elements among the aj, j > 1, make them non- 
negative by changing the element by some integer amount. This process 
results in a new equation with nonnegative coefficients but with the same 
fractional parts. The new equation is of course still satisfied by T ? . From 
this new equation with all nonnegative coefficients, the inequality involving 
the fractional parts can be derived just as above. Thus the inequality rep- 
resented by the row of fractional parts is a legitimate one satisfied by T ? . 

Furthermore the mapping J sends an equation like (2-4), which does 
have nonnegative aj into the inequality of (2-7), rather than the inequality 
(2-5) which is deduced directly from (2-4). Consequently we need only con- 
sider the inequalities which are represented by the fractional row vectors. 
Any other inequality, such as (2-5) is already implied by its reduced in- 
equality which is its image under the mapping J. 

Suppose that under J two rows Rj and R 2 go into two rows of fractions 
F! and F 2 . Then J (R t + R 2 ) is easily seen to be the row vector 



where the components of F are obtained by adding the components of F t 
and F 2 modulo 1, i.e., adding them and dropping the integer parts . The 
same observation applies to J (nR t ) where n is some integer. The result 
of the mapping is the vector of fractional parts obtained by taking J (R A ) , 
multiplying by n and reducing modulo 1. 

Because of this we are able to describe the class of reduced inequalities 
in a very simple way. Since all the elements of M are integer combinations 
of the rows of A k , all the reduced inequalities are integer combinations of 
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the images of rows of A k . Hence all reduced inequalities are integer 
combinations of the fractional part rows of Ak, addition and multiplica- 
tion being interpreted modulo 1. 

Under these rules of combination the fractional rows representing the 
reduced inequalities form a finitet additive group F, some of whose 
properties will be discussed in Section 4. The main result that will be 
produced is that under many circumstances F is actually cyclic, all the 
inequalities being produced as multiples modulo 1 of a single fractional 

row. 

Also because of their origin all these inequalities have the following 
property: if either they, or their slack equations, are expressed in terms 
of the original nonbasic variables, they become all integer inequalities 
(or equations). Thus the original problem as expressed in terms of its 
original variables is being enlarged by adding more all integer inequalities. 

(In actual machine programming so far we have produced the additional 
inequality by simply choosing a row of A k with non-integer constant term 
and writing the new equation 



by simply taking the fractional parts of that row. If you are doing things in 
this simple way it helps noticeably to take the row with the largest frac- 
tional f .) 

3. A GEOMETRICAL INTERPRETATION 

The inequalities such as (2-5) can be given a simple geometrical inter- 
pretation. To see this most easily let us suppose that the linear program- 
ming problem was given originally in terms of integer inequalities in 
which we now designate the variables by Y. 

AY<B 

We will consider the convex body C that these inequalities cut out in the 
space of the variables Y. Confining ourselves to this space makes geo- 
metrical interpretation easier. If we assume that no s-variables have yet 
been added, the other variables in any solution X are simply the slacks 
added in converting the inequalities above into equations. Clearly each of 
these slacks is an integer combination of the yj . 

We are now ready to consider the origin of an inequality like (2-5). 
Equation (2-4) was obtained as an integer combination of the equations 
represented by the rows of A k . If X is any solution, (integer or not), each 

tF will be finite whenever the matrix A is in integers or has entries 
which can be written as integers over some greatest common divisor. See 
Section 4. 
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right hand side of the equations (the rows of A k ) equals some variable 
Consequently the right hand side of (2-4) satisfies 



3=1 

where S^niX^ is some linear integer combination of variables x^. Since 
each slack among the x^ is an integer combination of the variables y-j, we 
actually have by substitution 



j=n 



n j v j 



So the right hand side of (2-5) represents a linear integer form L in the 
original variables Y. 

The trial solution to (2-1) t j = gives a certain X and hence a Y. The 
Y, as is well known, is a vertex P of C . 

If in Y space we consider the hyperplane L = a^ (3-1) shows that the 
hyperplane passes through P (see Fig. 1) . 

If the aj of (3-1) are all nonnegative, L takes on its maximum value at 
this vertex. Hence it is drawn externally tangent in Fig. 1. 

If ao is non-integer, and hence a sum n + f , f > 0, we can push the 
L = ao line (hyperplane) into C as far as the line L = n without cutting off 
any lattice points. (These all-integer coordinate points are dots in Fig. 1). 
This is because if there were a lattice point between L = ao and L = n , it 
would, since L is an integer form, give an integer value to L. But L has 
no integer value between ao and n . Thus L can safely be pushed in and 
L <no gives a new inequality. (Unless L has a common factor in its co- 
efficients it is actually pushed until it strikes some lattice point some- 
where.) 

With regard to s-variables it is necessary only to note that from their 
construction, they are an integer combination of the variables x i already 
present when they were constructed. Hence they too are integer combina- 
tions of the original Yj, and the geometrical interpretation goes through 
unchanged. 




Fig.l 
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4. THE GROUP OF INEQUALITIES 

We already know that the inequalities we need be concerned with are 
those formed by integer combinations of the fractional parts of the rows 
of A k . The main part of this section is devoted to showing that under 
many circumstances the whole group F of inequalities can be obtained as 
multiples modulo 1 of a single row vector. 

We will also show that F always contains D or less elements when D 
is a number defined below. t 

All these conclusions depend on a theorem of A. W. Tucker [6] which 
we restate here in the (weaker)form most suitable for our immediate 
purposes. Let A be an m f x n f matrix, m r >n r . Let P" 1 be the inverse 
of an n r x n T matrix P consisting of any n T rows of A. Let A k be given 
by 

A k = A o p-l 

Then the theorem asserts that each subdeterminant of A k is obtained by 
multiplying an appropriate subdeterminant of A by IP" 1 ), where |P~ | 
indicates the determinant of P"* 1 . 

We outline the proof of this theorem. Let be any square submatrix 
of A k . By interchanging rows and adding the appropriate unit rows from 
among the complete set of unit rows in A k , )3 can be enlarged to a square 
n T x n T submatrix, /3 T , with a block on the main diagonal of /3 T . Since 
p consists of n f rows of A k it is the transform by P" 1 of the square sub- 
matrix M(/3 T ) consisting of the corresponding rows of A. Thus we have 
F = P- 1 M(0 f ) and taking determinants |/3' | = | P" 1 1 | M | . But /3' consists 
only of diagonal 1's and the block p, so 1 18 | = | P" 1 J j M | as desired. 

The P" 1 which transforms our original A into A k in the simplex 
method is the inverse of an n + 1 rowed matrix P consisting of rows 
taken from A, so the theorem applies in the case in which we are 
interested. Of course our P" 1 is simply the product of the matrices P i 
representing the individual pivot operations? and each of these P^ has 
determinant I/Pi where p t is the pivot element. Consequently, as is well 
known, | P" 1 1 = l/piP2 - - - Pk- Clearly Pl p 2 P3 Pk = I P I and since P 
is made up from the integer matrix A, | P | is an integer D. 

Similarly any subdeterminant of A is an integer. Consequently 
Tucker's theorem, applied here, states that any subdeterminant of A is 
of the form m/D, where m is some integer (the value of a subdeterminant 
of A), and D is the (integer) product of all preceding pivots. 

We will need this property not for A k itself but rather for the matrix 
F(A k ) whose rows are the fractional parts of the corresponding rows of 
A k . However it is not hard to show that this property does carry over 
from A k to F(A k ). 

t Actually it contains exactly D elements (see concluding paragraphs, 
this section). 

JThis matrix is shown in Section 7. 
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To see this we can split A k into the sum of two matrices U and F 1 . 
The rows of F T are all the unit rows of A k and the fractional parts of the 
other rows. The U consists of zeros in the rows corresponding to unit 
rows of A k and contains the integer parts of the other rows. Then 

F t = A k _ u = A o p -i - u = (A o - UP) p-i = AP-I 

A is an integer matrix since UP is, and A still contains all the rows of 
P unchanged; the addition of UP to A only adds zeros to these rows. 
Consequently P" 1 is still the inverse of the (n + 1) x (n + 1) matrix P 
made up now from rows of A, and the conditions for Tucker's theorem 
are still satisfied. So by the same reasoning as before we have all sub- 
determinants of F T are of the form m/D and finally, since the sub- 
determinants of F(A k ) are included among those of F T , we have the same 
result for the subdeterminants of F(A k ). 

Our problem is to find out something about the row vectors generated 
by integer combinations of the rows of F(A k ) (components being combined 
modulo 1). We can now recast this problem into a familiar form. Since 
the elements of F(A k ) (1x1 subdeterminants) are of the form m/D, 
multiplication by D produces a matrix F of integers. (These integers, of 
course, are simply the numerators that appear if all the fractions in A k 
are written in the form m/D.) The subdeterminant property of F(A k ) 
translates readily into the following property of F". Every r x r subde- 
terminant of F" is divisible by D r ~ 1 . 

Combining rows of F(A^) modulo 1 is clearly equivalent to combining 
rows of F modulo D, and our problem is to find in the group G whose 
elements are all possible vectors with entries from the additive group of 
integers modulo D, the subgroup F generated by the rows of F". In order 
to apply the standard elementary divisor theorem we will change viewpoint 
slightly and regard G as a -module over the integers with the basic 
elements 



e = (1, 0, ..., 0) 
e 1 = (0", T, ..., 0) 



etc., with 1 the unit of the integers modulo D. Then a row 
(m , m^ . . . , m n ) of integers represents the element 

m e + mi et + - - + m n e n 

Now to find the submodule F generated by the rows of "F we apply the 
elementary divisor theorem (van der Waerden [7]), and find that by choos- 
ing new bases in the module G and submodule F we obtain a new matrix 
of the form 
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El 
2 



' 

whose rows generate the submodule. As is well known, e^ divides 
and is the greatest common divisor of all the elements of 7. More 

i=r 

generally .11 i is the g.c.d. of the (r + 1) x (r + 1) subdeterminants of 

- ' *^ r 

F. Hence II t[ is divisible by D r . 

i=o 

Now let us consider^ special case. Let us suppose that e ' the g.c.d. 
of all the elements of F, is a number relatively prime to D. (Unless the 
numbers in the original problem had common factors or were arranged 
with a good deal of symmetry it seems reasonable to suppose that this g.c.d. 
actually will be 1 a good part of the time, so this special case is expected 
to be prevalent, and in fact it has been prevalent in the examples done so 
far.) Then, since D divides 4 it must divide Cj and hence all suc- 
ceeding i- Since D annihilates every member of G (i.e., Dg = for all 
g G) the rows containing t- v i > all represent the zero element of G 
and consequently the module F is cyclic and is generated by the single 
element g represented in the new basis by ( , 0> 0, . . ., 0). 

Of course it is also generated by any multiple ng where n is relatively 
prime to D. This result can be restated to refer directly to F regarded 
as the group whose elements are rows of fractions of the form m/D. 
Here we can say that if an element of F cannot be rewritten as a row of 
fractions over some new common denominator smaller than D, then the 
multiples of this row (modulo 1) already generate all F. 

This property shows up in the example at the end of this paper. 

i=Sr r 

By using the fact that II ( is divisible by D r we can obtain a more 

general statement about the rank of F. 

Let p be any prime, let o^p) be the power to which the prime divides 
[, let a(p) be the power to which it divides D. Then since ^ divides 



cq(p) < ai+i (p) 



and the condition which expresses divisibility by D r is 

i=r _ 

ceo (P) + a i(P> - r <* (P) < 4 " 1) 

i=i 

If R (p) is the largest i for which 
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04 (p) < OL (p) 
we have the inequality on the rank R of F, 

R < 1 + max R(p) (4-2) 

p D 

where p D means that only primes dividing D are considered. Clearly 

once the index i exceeds max R(p), every prime in D appears in &< to 

p D 

as high a power as it does in D. Hence these ^ represent the zero 
element, and hence (4-2). 

Also if Qfo(p) oi (p)> then R(p) = 0, so we need consider only primes 
for which o?o(p) < <* (P)* 

We next get an estimate for R(p) .in terms of aoO?)- From (4-1) and 
the definition of R(p), we have for any p D 

i=R(p) 

R(p) 5(p) ^ <* (P) + E a i(P> - a o(P> + R(P) [(P) ~ 1] (4-3) 

1=1 

Comparing the extreme right and left sides gives 

R(p) < a (p) (4-4) 

so 

R <1 + max a (p) (4-5) 



I pe] !L 1 
|a (p) <ot (p)J 



Or in words, if pj divides to a lower power than it divides D, then 
the rank of the submodule is equal or less than one plus the largest of 
these powers. 

In particular if D has no repeated primes in its factorization the rank 
of the submodule is again 1. 

In all the cases where F is cyclic, it clearly has D elements or less. 
We will next show that this property of having D or less elements holds 
whether F is cyclic or not. 

The elementary divisor theorem has been used to exhibit F as the 
direct sum of cyclic subgroups. If we give each subgroup the index of the 
corresponding q, it is obvious that each is annihilated by multiplication 
by D/(I, D) where (a, b) is used to indicate the greatest common divisor 
of a and b. Consequently each subgroup contains at most D/(I, D) 
elements and the total number of elements then is at most 

i=r 

n o D/(i, D) 

In this product we need consider only the i up to the first one that is 
divisible by D (call it r ), since the ratios are only 1's from that point on. 
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Using the notation of the preceding theorem and the division property 
of D we have, for any prime p in D 

_ i=R(p) 
R(p)a(p) ^ ai(p) 
1=0 

In this range of i the cq (p) are individually less than a(p), while for 
i > R(p) we have [by the definition of R(p)] cq > a. Hence 

i=r 

min (i(p), a(p)) ^ R(p) 5F(p) + (r -R(p)) a(p) = r a(p) (4-6) 

i=o 

But the sum on the left is exactly the power to which p appears in 

i=r 

.11 (i, D). Eq. (4-6), repeated for each prime, then assets that D r 

i=r 

divides n (E^ D). Consequently the order of the group F is equal or less 
than 

i=r D _ D r+l _ D r+l _ ^ 



11 /e r\ i=r -nr 

i r=0 wi -'/ TT "* -^ "I 

i=0 

and hence is D or less. 

If, as we have assumed throughout, the original problem contains a set 
of unit rows (i.e., it is given as an inequality problem, rather than one in- 
volving equations), this result can be improved to show that the order of F 
equals D. A derivation of this is sketched here. The transformation P" 1 
sets up an isomorphism between the module M(A) generated by the rows 
of A over the integers and the module generated by the rows of A k . The 
mapping J of Section 2 then gives a homomorphism of this last module 
with the group F of inequalities, and hence a homomorphism of M(A) and 
F. Taking into account the nature of the mapping J it follows that the 
elements going into zero in this correspondence are those generated by the 
rows of A that go into unit rows in A k . Designating the matrix of these 
rows by P, we have F is isomorphic to the factor module M(A)/M(P). 
Since A contains a complete set of unit rows, M(A) is simply all the 
integer n + 1 vectors. The elementary divisor theorem can now be 
applied to show that the number of elements in the factor module, and 
hence in F, is | P | = D. It is assumed here that A also has been enlarged 
whenever a new row is added to its transform A . Although this iso- 
morphism still holds when there is no set of units in the original problem, 
and D is taken to be the product of pivots, the inequality given above is all 
that is obtained. 
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5. CHOOSING INEQUALITIES 

When choosing among desired inequalities certainly the goal is to 
choose an inequality in which the ratios f /fi are as large as possible. 
Geometrically this is choosing the new inequality whose equality plane 
intercepts the tj axes as far as possible from the origin. Since in one in- 
equality some of these ratios may be large and some small, there is no 
clear-cut comparison among inequalities except in the case where all 
ratios from one inequality are greater than all ratios from another. How- 
ever there are many criteria which seem sensible that can be used. 

First is the one that has been used in actual programming so far. 
Choose the largest f in the matrix and use this row. The basis of this 
choice is largely an argument from ignorance. If you don't want to bother 
to look at the various fj in the rows or generate new fractional rows by 
addition or multiplication, you don't know anything about them. Con- 
sequently you try to get favorable f /f[ ratios simply by choosing a 
large f . 

This criterion is certainly subject to improvement in many ways, in- 
volving different amounts of work. 

This seems to be the crudest possible criterion; at the opposite ex- 
treme one can generate a whole series of much better criteria by using 
the Euclidean Algorithm. The Euclidean Algorithm explicitly computes the 
representation of the greatest common divisor of two integers a and b 
as an integer combination of a and b, 

(a, b) = ma + nb 

and of course the g.c.d. is the smallest positive number that can be rep- 
resented in this way. 

The Euclidean Algorithm is especially useful when you want to predict 
the effect of multiplying a given row (inequality) by an integer. Suppose it 
is decided to multiply in such a way that f is transformed into a new 
constant term Q which is as large as possible, but still less than 1. If 
f = h/D find the representation of (h, D) 

(h, D) = mD + nh 

Multiplication by n will then produce f r = (h, D) /D, which is the smallest 
possible nonzero fj, and, as is easily seen, multiplication by (D - n) will 
produce 1 - (h, D)/D which is the largest possible fj. If the group F is 
cyclic, and the initial row is one of the generators, you will have found in 
this way the reduced inequality with largest possible constant term. 

This constant reference to a large constant term should not be taken 
too seriously. It may well be more important to have a row with small 
average fj. 

Another approach would be to ask for the inequality with deepest possible 
intercept with, for example, the ti axis. This is obtained by computing 
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(14, D), where f^ = hi/D. Clearly fi x (D/(hi, D)) = modulo 1, so if 
f x (D/(hi, D)) P^O, the deepest possible intercept is obtained by multiplying 
by D/(hi, D). If f x (D/(hi, D)) = 0, then it is not hard to prove that the 
deepest intercept is obtained by multiplying through by nj where 14 is 
given by 

(h^, D) = mi D + Hi hi 

This produces the multiple with the smallest nonzero f[. 

The row obtained in this way gives the deepest intercept of all possible 
multiples of the original row. Again if the original row was one of the 
generators of the group and the group is cyclic, the deepest possible inter- 
cept by a reduced inequality has been obtained. 

It would be reasonable to choose as ti the variable whose column is 
(lexicographically) least negative. This is equivalent to choosing to make 
the deepest possible intercept in the axis along which the objective 
function decreases most slowly. 

Still another approach would be to throw on several or even many new 
inequalities and let the simplex method itself do the choosing by its 
choice of pivots. In the course of making all the elements in the zero 
column nonnegative, pivots will occur in some but not usually all new 
rows. Rows in which pivots occur represent inequalities that are used, 
while the rows whose constant term goes positive without a pivot in that 
row, represent inequalities that have been satisfied as a result of satis- 
fying other ones. These rows can be dropped as soon as this occurs. 

Although the very crudest sort of criteria have been very successful so 
far, it may well be that a criterion of the Euclidean Algorithm type used to 
get deep cuts in particular directions will be needed in problems in 
which D, and hence the number of reduced inequalities, becomes very 
large. 

6. SOME ADDITIONAL INEQUALITIES 

The reasoning that produced the inequality (2-5) can be summarized 
this way. The right hand side is known to be nonnegative, the left hand 
side is written as n + f . Since the two differ by an integer, the right 
hand side is either f , or 1 + f , etc., and hence is >f . Now if (i) f = 0, 
(ii) all the aj are strictly > 0, and (iii) the current X is not an integer 
solution, a stronger inequality can be deduced. Since f = is assumed, 
the inequality would only say 



Since all the aj are strictly positive, the equality can be obtained only when 
all the t] are zero. However by assumption (iii) the current X, which is 
given by these values of tj, is not an integer solution, therefore the 
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equality does not hold for the T f of any integer solution and we must 
actually have 






This inequality can then be improved by reducing the aj as in Section 2. 
However those aj that are integers cannot be reduced to 0, as in Section 2 3 
because of the requirement that the coefficients be strictly positive; con- 
sequently they are reduced to 1. 

This reduction can be carried out systematically as before; the only 
differences from the procedure of Section 2 are in dealing with equations 
in which the a<j term is an integer. The final inequalities obtained are 
simply these: all members of F having f *0 are obtained just as before, 
each member of F having f = is replaced by a new vector having a 1 
written wherever a appeared in the original F element. These repre- 
sent the new inequalities, and they are easily obtained from F. 

It is important to remember that these inequalities can be used only 
when the current solution is non-integer. They cannot be used, for ex- 
ample, to convert from a noninteger to an integer matrix after an integer 
solution (0 column) is obtained. They also do not have the properties 
required in either of the two convergence proofs. 

It is interesting that by this method we obtain from the zero element of 
F the row vector 

(1, 1, . . . , 1) 

which arises when all the zeros are replaced by ones as prescribed above. 
This particular inequality is the one obtained by Dantzig [4] in a very 
simple and direct way at a time when only the inequalities described in 
Section 2 of this paper were known. 



7. LEXICOGRAPHICAL DUAL SIMPLEX METHOD 

George Dantzig' s simplex method is the fundamental algorithm on which 
this integer algorithm is based. Since the simplex method exists in many 
forms and many notations, the particular variant used in these proofs 
needs to be described. 

In order to facilitate proofs, a lexicographic alf version of the simplex 
method is used. This ensures that even in cases of degeneracy the simplex 
method still goes through. 

In order to facilitate adjoining or dropping equations as required in this 



tSee Dantzig, Orden, Wolfe [5]. 
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algorithm, the dual simplex method (applied to the primal problem) will be 
used.f 

Let us assume then that the original inequalities of a linear program- 
ming problem have been turned into equations as in Eq. (1-1) and that 
the unit rows have also been added as in (1-1) . We then have an array in 
which aU the variables are expressed in terms of some of them, specif- 
ically in terms of the initial collection of nonbasic variables. 

Thus the linear programming problem is to maximize 

3=n 

z = a-0'0 + Z/ a-o.j ("tj) 
3=1 

subject to 

j=n 



3=n 

a m+n,o + S a m+n,j <-$ (7-1) 

3=1 

and the condition that all variables be nonnegative. It is to the above 
array which includes the unit rows that we will apply the lexicographical 
dual simplex method. We will refer to all coefficients on the right in 
Eq. (7-1) as the ay . 

The fundamental operation used is pivoting or Gaussian elimination on 
rows. Here this means that an element ^ j is designated as pivot ele- 

ment. Then the i^ equation is used to express tj in terms of xi and the 
tj> 3 *3o- Tnis expression is then substituted for tj Q wherever tj appears 
on the right hand side of the Eqs. (7-1). The result is to make x io non- 
basic in place of t? while t- becomes a basic variable. The effect on the 
matrix A, whose elements are the a^j is simply this: the j column 
aj fl is replaced by (-l/ai ,] ) aj then the appropriate multiples, a lfl j, 

of this new column are added to the other columns so that in the resulting 
matrix A 1 the a- * elements are zero except for a| j which is -1. In 

other words A 1 , the matrix whose coefficients express all the variables x i 
in terms of the new nonbasic set th is simply 



fit is also possible to dualize the problem and use the primal method 
on the dual problem. This has the advantage that new variables (columns) 
rather than new equations (rows) are added during the computation, and 
this is easier in the usual simplex machine codes. See Markowitz and 
Manne [2], 
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A o p -i 
where the (n + 1) x (n + 1) matrix P~* 



\ 



\ 



io3o ^'Jo 
1 

1 

1 



(7-2) 



is the negative of the inverse of the (n + 1) x (n + 1) matrix consisting of 
the i row of A and all the unit rows except the one involving tj fl . 

The dual simplex method consists of a succession of such pivot steps, 
resulting in a sequence of matrices A k . With each A k is associated a 
"trial solution" obtained by setting the current aonbasic variables t k 
equal to zero, and then choosing the current basic variables, the x^ to 
satisfy the equations, i.e., 

x k -a k 
x i ~ a i,o 

The matrix and trial solution are usually said to be "dual feasible" if the 
a^j, j *0 are all nonnegative. They are feasible (or primal feasible) if all 
the ai, , i ^0 are nonnegative. If A k is both primal and dual feasible, 
the associated trial solution is the solution to the linear programming 
problem, since the primal feasible property makes the x^ values non- 
negative and, since 

J= 
z = ao,o + ao 

with all ao,j nonnegative, z = a^ is the largest possible value of z. 

We will depart from this nomenclature in one way due to the fact that 
a lexicographical method is being used. We will say that a column vector 
p is (lexicographically) positive (/5 > 0) if the first nonzero entry of /3, 
counting from the top down, is positive. Negative is defined similarly. A 
column vector p is greater than another column vector /?' if p - {P > 0, 

Using positive and negative for columns in this sense we will say that 
a matrix is dual feasible if all columns aj, j * 0, are positive. This notion 
coincides with the meaning of dual feasible given above except when some 
of the elements of the top row are zero. 

We will assume throughout that our starting matrices A are such that 
they can be led into dual feasible form by a succession of pivot steps. 
(This is always the case if, for example, the convex body cut out by the 
original inequalities is bounded.) It is also worth noting that, since the 
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transforming matrices P"" 1 are nonsingular and A, because of the unit 
rows, is of maximal rank, no all zero columns can appear in any A k . 

Once a dual feasible form is achieved, the dual simplex method proceeds 
to obtain the solution in the following way: choose some row i (i ^0) 
with aj[ 0>0 negative. Then consider the columns aj for which (1/ai j) o?j 
is negative, and select from these that column which is the least negative. 
Then pivot on 3-i ,j - It is easily verified that this pivot step results in a 
new matrix A T in which the columns are still positive and since 



(a negative vector) has been added to a , it follows that a r < a . 

A succession of such steps results in a succession of strictly decreasing 
Q!Q . There can only be a finite number of these since there are only a finite 
number of possible sets of nonbasic variables, and any choice uniquely 

u 

determines an a . Consequently the process must stop. This can happen 

Tr 

either if there are no negative elements in the current QJ O , or if there 
is a negative a^ but no negative columns (l/ai j) oiK in the first case 

the solution has been obtained, in the second it is easily seen that the 

negative value a^ is the largest value that the variable x* rt can attain, 

IQ> 

and consequently no solution in nonnegative numbers exists. 



8. FINITENESS PROOFS 

In these proofs we will use the lexicographical dual simplex method 
described in Section 7. It is not implied that this simplex method need 
be used in practice or that it is necessary to the proof. It is simply that 
its use in the proof has reduced the original rather long and tedious proofs 
to relatively simple ones. 

Let us assume then that we have obtained a (lexicographically) dual 
feasible solution, and that in all succeeding pivot steps we choose pivot 
elements in accordance with the lexicographical dual simplex method. 

k+1 
After each pivot then we obtain a new "trial solution" a which is 

strictly lexicographically smaller than its predecessor Qf . 

We will also assume that some lower bound is known for the value of 
z. That is we assume it is known that if an integer solution exists, it gives 
a z -value s: some known (possibly large negative) M. This is always the 
case if we are dealing with a bounded convex body. 

First Method of Proof 

Let us assume that we adopt the following procedure. Proceed with the 
simplex method until an optimal solution is obtained; if this solution 
a? = (a , a 0> ..., a m+n , ) is not in integers, let a^ be the first 
noninteger component. Then introduce the new equation 
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and adjoin it to the bottom of the existing set. (We know from Section 2 
that any nonnegative integer solution of the original set of equations 
satisfies the new set and gives the new variable s a nonnegative integer 
value.) 

We now apply the dual simplex method to obtain a new feasible max- 
imum, for the HE f , term has introduced a negative element into the 
zero column. This new application of the simplex method either gives a 
new optimal solution or indicates that no feasible solution exists. In this 
second case we know that no solution to the original problem exists in 
integers. In the first case if the new maximum is still noninteger, we re 
peat the process. If remaximization is always possible we will show that 
an integer solution is attained in a finite number of repetitions. 

Let us suppose the contrary. Then the process would produce an in- 
finite sequence of trial solutions 



We will consider only the first m + n + 1 components of these trial solu- 
tions. Since the trial solutions are monotone decreasing; the successive 
first components ao, are also monotone decreasing, since we assume that 
z has a lower bound M, so do the ao, or else clearly no integer solution 
exists. Let n , be the largest integer such that n ,o ao,o f r a ^ k. We 
will first show that a^ = n o>o for all k > some k . 

From the definition of n , it follows that after a certain point the a^o 

k k 

can all be written as n , + f Q,Q with f jf,o < 1. A finite number of pivot 

steps after this point is reached, anew maximum must be obtained. If k T 

k r 
is the index at this point and f 0>0 is not zero, our procedure will select the 

fractional parts of the 0-row to form the new equation 

k'_ j=n ^ 

s ~ *o,o 2_* ^o,j (~~ty 

The dual simplex method next selects a pivot element from this new 
row. If the j column is selected, the new value of z after the pivot is 




k T 
Since all the a^j are nonnegative at an optimum point we have 

k' ,k' 
^Jo - f Ho 
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and hence 



k T +l k r k 1 
a o,o - a o,o 1 0,0 ~ n o,o 

^ 
so for k equal or greater than k T + 1, a^o = n 0>0 . 

Since the first component is now fixed at n 0)0 , the second components 
of the trial solutions are now monotone decreasing and bounded from be- 
low by zero for k > k 1 + 1. (If the component fell below zero, reoptimiza- 
tion would then fail to be possible.) 

We can now repeat word for word the argument given above for the 
first component by simply changing the first subscript from to 1. This 
shows that the second component attains and remains at some integer 

k Tt 
value. The only point that requires any explanation is the reason why a lsi 

1 Tt 

should be nonnegative. aj is nonnegative because the first entry in the J 
column is 0, and, since the column itself is (lexicographically) positive, 
the second element, which is of? , could not be negative. The top element 

IrTT 

is 0, because otherwise the pivot step would actually strictly decrease a^Q 
below its already attained minimum value n 0)0 . 

Just as above then we can conclude that the second component attains 
some integer value and then remains at it. This argument is then repeated 
for all the original m + n + 1 variables. This gives the integer solution. 

What has been shown of course is that an integer -column ot^ is 
eventually attained. To obtain an all-integer matrix you simply con- 
tinue. Take any row of the matrix that still contains fractions, and use 
the fractional parts as a new relation as before. Of course now the f term 
will be zero. Consequently the next pivot step will leave all the values in 
the zero column unchanged and we are still at the same optimum point and 
in optimal form. If fractions still remain anywhere, the process is re- 
peated. Since the D number which forms the denominator of all the 
fractions (as discussed in Section 4) is the product of all pivots, and 
these pivot elements are now all proper fractions, | D | constantly de- 
creases. Since | D | is an integer and * 0, either |D| becomes 1, or 
else the process stops because the matrix has no fractions in it. Actually 
these two cases coincide; if the matrix is all integer, | D | is necessarily 
1. To see this remember that the transforming matrix P" 1 has de- 
terminant 1/D. The inverse of this transformation (i.e., P) sends n + 1 
rows of A k into the unit rows of A. Hence P is the (negative) inverse of 
of a square submatrix of A k . If A k is all integer, this implies that the 
determinant of P is 1/D 2 where D 2 , the determinant of the square matrix 
in question, is an integer. Since the determinant of P also equals D, it 
follows that D = D 2 = 1. Hence the final matrix is related to the initial 
one by a unimodular transformation. 

The procedure given above can be greatly modified without changing 
the proof. For example it is not necessary to choose the new equation at 
each optimum point by the rule given above. If this rule of choice is ap- 
plied every tenth time or every hundredth time the proof still goes through 
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and you are free to add any new relation or relations whatsoever at all the 
other optimum points. Another way to relax the choice restriction is to 
use, instead of the fractional parts of the i row, any multiple m of the 
row of fractional parts that has the property mf^ < 1. This new relation 
provides a stronger inequality and is easily seen to have the same effect 
on the next trial solution as does the inequality used in the proof. 

This proof, as well as the next one, goes through unchanged if the row 
representing a new variable (s variable) is dropped as soon as the s vari- 
able involved becomes positive, or more exactly, as soon as the s variable 
leaves the nonbasic set and becomes basic. Dropping such a row does not 
affect the lexicographical positiveness of any column as this is determined 
by the topmost element, and a column with nothing but zeros above the 
s-row cannot occur (see Section 7). 

Since there can be at most n nonbasic s variables, the number of addi- 
tional rows need never exceed n * 1. 

Dropping the extra inequality is allowable, of course, as only the 
original inequalities need be satisfied to give a solution. 

Second Method of Proof 

This proof will show the finiteness of a different sort of process, one 
in which there is no distinct repetition of optimization, new relation, re- 
optimization, etc., but rather a process in which (at least up to a certain 
point) the adding of new equations and pivot steps of the dual simplex 
method can be interspersed at random. 

If it is assumed that the first step is to obtain a dual feasible matrix, 
the remaining choices to be made in carrying out the algorithm can be 
summarized in this way. 

If the matrix is both nonoptimal and contains fractions, one can either 
make a step of the dual simplex method, or first choose and add some re- 
duced inequality, and then make a pivot in the row of the new inequality. 
If the matrix is in integers but not optimal, one must make some step of 
the dual simplex method. If the matrix is optimal but not all in integers, 
one must choose some new reduced inequality, add it, and pivot on its row. 
If the matrix is in integers and optimal, the problem ends and no further 
steps are made. 

The sequence of choices results in a sequence of trial solutions. In 
order to have an infinite sequence of trial solutions ajf, let us assume 
that after an integer optimal matrix is achieved, the corresponding trial 
solution is simply repeated in the sequence from that point on. 

In the following, D k indicates the value of D after the kth pivot. 

We can now assert that if an integer solution exists, any way of making 
the above choices which ensures that lim inf | D* | < QQ will actually attain" 

k-^oo 
the solution (in fact an integer optimal matrix) in a finite number of steps. 

An example of such a procedure would be to make all the choices quite 
freely until | D k | rises above some predetermined value N (if it ever 
does). With |D | ^1 the matrix can not be all integer (this is easy to 
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prove) so it is possible to add a new relation. Add any new relation and 
pivot on its row thus producing a new D k +l with | D k+1 1 = f | D k | with f 
the pivot element (of course | f | < 1) . If D k+1 is still greater than N 
repeat the process. Eventually the D value will decrease below N and 
then choices can be made quite freely again. Clearly this process provides 

a lim inf | D k | < N < . 
k--oo 
To proceed with the proof, let us suppose that we have a procedure 

which guarantees a sequence of matrices with lim inf | D k | < a . Since the 

k oo 

D k are integers, there is a value M which is attained by D k for an 
infinity of values of k. If we denote these k by k^, then we have a de- 
creasing subsequence of trial solutions 



> a 



and all entries in these vectors are of the form m/M with m integer. 
Since we assume the existence of an integer solution, the first components 
are bounded from below, decrease, and hence have a limit point. Since 
they are all rational numbers with denominator M, they can have a limit 
point only if they reach some final value and then repeat it. Consequently 
the first component attains and remains at some fixed value for kj 
greater than some fixed ko. The second component must be monotone 
decreasing from this point on. It is > since a negative value would 
imply that the problem now has no feasible solutions (further dual 
simplex steps can only make it more negative), which would in turn imply 
that the original problem has no integer solution, contrary to assumption. 
Consequently the second component has a limit value which it too attains. 
This argument is repeated for all components, so finally the ao 1 reach 
some final vector a which is then repeated. The only step of the algo- 
rithm which does not cause a strict decrease in the a is the addition 
of a reduced inequality with f = followed by a pivot on this row. A 
finite sequence of these steps will produce an integer matrix. Since 
there is no further decrease in a at any later step, this must be an 
integer optimal matrix. This ends the proof. 

Unlike the first proof this one assumes that an integer solution exists 
and shows that the process finds that solution in a finite number of steps. 
The first proof either found a solution or else showed that none existed. 
If the second procedure is applied just as described to a problem not 
having an integer solution, this fact is not guaranteed to become apparent 
in any obvious way. This situation can be remedied if the procedure 
adopted provides for some periodic reoptimization (obtaining a primal 
feasible solution). Then at any point the impossibility of reoptimization 
indicates nonexistence of a solution. 

A particularly intriguing procedure of this type was suggested by 
E ML Beale and stimulated the search for the above finiteness proof . 
In this procedure N is taken to be 1; thus the matrix is constantly being 
returned to all integer form. 
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9. MISCELLANEOUS COMMENTS 

Computational Experience 

This is largely limited to small problems. Many small problems, 
similar to those in Section 10, have been done by hand with very en- 
couraging results. The method was programmed on an E101 computer 
and the results, again on small problems, were also encouraging. During 
a stay at RAND a FORTRAN I program [8] was written for the RAND 704. 
Since only single precision arithmetic was available in FORTRAN I, the 
method was programmed exactly, numerators and the denominator (D) 
being stored separately. (This happens to be very easy to do in these 
integer problems, and is shown in one example in Section 10.) This 
numerical approach avoids dealing with round -off error. However it has 
the drawback that the program fails if some of the numbers involved get 
too large and overflow. Eight problems were run with the following 
results: 

E 5 5 7 7 12 12 15 15 
P6469 8 13420 

Here E is the number of inequalities in the original linear programming 
problem, and P is the number of pivots required after reaching the orig- 
inal noninteger maximum by the simplex method. The number of variables 
involved in the inequalities was approximately the same as the number of 
inequalities in each case. Of course the number of variables was later 
approximately doubled when the inequalities were turned into equations 
by the addition of slack variables. 

This experimental program involved the crudest possible criterion (the 
max fi >0 criterion) and added inequalities one at a time. Only small D 
numbers (in the hundreds at most) were encountered in these eight prob- 
lems . One other fifteen inequality problem failed when the run ended in an 
overflow. 

Some Direct Extensions 

Extension of the method to the case where there are equations in the 
original problem in place of inequalities is straightforward. So is the ex- 
tension to the case where some of the variables are unrestricted in sign. 
Also the inequalities of Section 2 are still valid if the starting matrix is 
not a matrix of integers, as no use was made of this face in Section 2. The 
main point of having all integer inequalities was to assure that the slack 
variables are integers. If this is assumed separately, or if the problem is 
an equation problem with no slacks, the integer matrix is not needed. 

An example of a' problem in which some of the variables are un- 
restricted in sign is the problem of finding the greatest common divisor 
of 2 (or more) integers aj. Of course the g.c.d. is the smallest nonzero 
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integer that can be expressed as an integer combination of the numbers 
involved. Hence it is the solution to the problem. 

minimize z = ajXj + + a n x n subject to z > 1 



The x t are integers and the only variable restricted in sign is the slack 
introduced by the inequality. If the method of this report is applied here 
it solves the problem by doing the Euclidean Algorithm, There are various 
forms of the Euclidean Algorithm and they correspond to the various ways 
the method can be applied when unrestricted variables are present. 

An example of a problem involving only unrestricted variables is the 
problem of solving a set of linear diophantine equations, i.e., of obtaining 
all integer solutions to a set of linear equations with no maximum prob- 
lem and no restriction in sign on the variables involved. The method of 
this report also solves this problem in a very simple and rapid way. 

It is interesting that in this last application the method is still suc- 
cessful (for purely algebraic reasons) even though there are no sign re- 
stricted variables present, and hence an interpretation in terms of ad- 
ditional inequalities, or cutting off parts of a convex body, is completely 
inapplicable. 

In a different direction is the question of solving not one integer pro- 
gramming problem, but a family of such problems. Here we can make use 
of the fact that the final matrix is a unimodular transform of the original 
one, for if an integer change is made in the right hand sides of the original 
inequality, the result on the final matrix is also an integer change. If the 
right hand sides are decreased by an integer step, the various extra in- 
equalities deduced during the computation are still valid; thus the s 
variables are still required to be nonnegative. The effect of such a de- 
crease on the final matrix is merely to add certain of the columns to the 
zero column. If the elements of the zero column remain nonnegative, the 
solution to the new problem has been obtained. If some go negative, some 
additional steps are required. This is entirely analagous to the usual 
notion of parametric programming. 

The Mixed Problem 

This is the problem in which some, but not all, of the variables in- 
volved are required to be integers. This is not a direct extension. How- 
ever extensions to this case have been made, first by Beale [9] and later, 
more directly, by Gomory [10] . Both methods are almost completely com- 
putationally untested. 

Finiteness Proofs 

Although much work remains to be done on the material in all sections 
of this report, the situation seems especially unsatisfactory in Section 8. 
The finiteness proofs given there allow a good deal of choice, especially 
if the choices of inequality that they dictate are made only occasionally* 
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However there is no proof that if the choices described in Section 5 are 
made all the time, the integer answer will be obtained; yet these choices 
seem to be the desirable ones from the point of view of making rapid 
progress and have been used on the machine programs. Actually in all 
problems done so far, any method involving the reduced inequalities has 
worked. It would be desirable to know whether or not this is true in general. 

Applications 

Examples of combinatorial problems reducible to integer programming 
problems were given by Dantzig [11], In another direction the fact that 
nonconvex problems are reducible to (usually mixed) integer programming 
problems has been known at RAND for some time. A device involving a 
nonconvex objective function was given by Markowitz and Manne [2], and 
the subject is first treated systematically by Dantzig [12] in a paper con- 
taining many interesting applications I 

Round-off Error 

Most of the method does not appear to pose problems of round-off error 
very different from those encountered in the usual simplex method. It 
seems that the round-off problems arising in those features that are dif- 
ferent from the simplex method can be overcome. 



10. EXAMPLES 

In these examples we will not require a lexicographical simplex method. 
We will follow A. W. Tucker in using a "condensed" form and so will not 
include unit rows (see the examples). 

The simplex rule for choice of pivot element in both primal and dual 
methods can be summarized as follows: if the problem is primal [dual] 
feasible, i.e., a^o ^0, i > 1 [ao,j ^0, j > 1], choose a column j [a row i ] 
with first element a^j [ai , ] negative. From among the positive [negative] 
elements in this column [row] select the one for which the ratio 
a i,o/ a i,Jo t a o>j/ a ioj] attains its least absolute value. This is the pivot element 
element a i() , JQ . 

The effect of pivoting on pivot element aj. ,j can be summarized as 
follows: a new array is obtained in which the variables at the end of the i 
row and top of the j column have been exchanged, and in which the new 
coefficients a[ j are given by 

a i,j = a i,j - a i ,j a i,j /ai o>Jo * *** I *Jo 



a Uo = 1/a io>3o 
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When by a succession of pivots an array is obtained which is both 
primal and dual feasible (and hence optimal), a new equation (or equations) 
representing a new inequality is added. This equation is simply 



j=l J 

where the fj are the fractional parts from some row and the tj are the 
current nonbasic variables, or are the fractional parts obtained by com- 
bining (modulo 1) several of the fractional part rows, or multiplying a 
row by an integer. For rules of choice see Section 5. Here by fractional 
part f of an element a, we mean the number obtained if an element a is 
written as n + f with n an integer and <f < 1. 

After such an equation is added, the array is not primal feasible, so 
further steps of the simplex method (usually the dual method) are made 
until the array is again optimal. This process is then repeated until an 
integer answer (integer column of constants) or an all integer matrix 
(whichever is desired) is obtained. 

S variables can be dropped whenever they emerge from the nonbasic 
set, and of course many variations on the above procedure are possible. 

In order to have a fixed procedure for these examples, inequalities will 
be added one at a time and the fractional parts of some chosen row in the 
matrix will be used directly. The row will be chosen as follows: select 
the column with the smallest entry in the top row (min a^j, j > 1). Then 
select the row having the smallest fractional part in that column. (An all 
integer row is not considered.) This choice of row is an attempt to get a 
deep "cut" in the direction of least rapid decrease of z. The row used is 
marked by an arrow in each case. 

The pivot element is marked by an asterisk. 

The entire inequality group will be listed for some arrays. 

Example 1. 



Maximize z 
Subject to: 



?q + 5x 2 + x 3 
Xi + 2x 2 < 10 

Xi + 4x 2 < 11 

3xj + 3x 2 +x 3 < 13 
Introduce slack variables x t , x 2 , x 3 

Starting array, primal feasible. 






-4 


-5 


-1 


10 


3 


2 





11 


1 


4* 





13 


3 


3 


1 
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1 -Xi 



=4X1=4 



X 2 = 



X 3 = 



f -! 


l \ 


-1 


4- 2-* 


i 





4 4 


4 




i J 


j. 

4 





4 ! 2 i 


3 

4 


1 



|D|=4x2-=10 



18^ 3 


l_ 


7 
10 


-1 


JL 


_ 


2 





10 


10 


10 




2 - 


J^ 


3 





10 


10 


10 




10 ~" 


To 


10 


1* 



|D I =10. Optimal 



Solution: 



X 2 = 



,A 

'10' 



Inequality Group F: 

1/10 (3, 9, 3, 0) (8, 4, 8, 0) * x s = 

x2 s (6, 8, 6, 0) (1, 3, 1, 0) 

x3 = (9, 7, 9, 0) (4, 2, 4, 0) 

(2, 6, 2, 0) (7, 1, 7, 0) * 

(5, 5, 5, 0) (0, 0, 0, 0) 

In the original coordinates the inequality 
inequality: 



10 



4 


2 


4 




T Q - 







1 


10 


10 


10 




. 8 

Ho 


4 
10 


2 
10 





2 io 


1 
10 


3 
10 





7 


9 


3 










1 


10 


10 


10 




J^ 


JL_ 


7* 










o 


10 


10 


10 





becomes the new integer 



(11 " Xl ~ 



or 



+ 3x 2 < 8 
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1 -X 



-Si 



|D | = 7. Optimal 
Integer solution is: 
z = 19 X A = 2 
x 2 = 2 x s = 1 

F: ~ (0, 1, 4, 0) (0, 5, 6, 0) 

(0, 2, 1, 0) (0, 6, 3, 0) 

(0, 3, 5, 0) (0, 0, 0, 0) 

(0, 4, 2, 0) 



19 


7 


4 
7 


1 




3 


2 




2 


7 


7 





2 


1 
7 


3 

7 





1 


6 
7 


3 

7 


1 




1 


3 




1 




_^ 







7 


7 







I* 

7 


4 
7 






I D I =1. Integer matrix 



1 -8, 



-si 



19 


1 





1 


2 


3 


-2 





2 


-1 


1 





1 


-6 


3 


1 


1 


1 


-2 








-7 


4 






Example 2. 
Maximize z = 
Subject to: 



-x 2 

- 2x 2 ^ 3 



X 2 



z = 



1 -x 






-3 


1 


3 


3* 


-2 


-10 


-5 


-4 


5 


2 


1 
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1 -a 



- (0, 1, 1) 
(0, 2, 2) 
(0, 0, 0) 



z = 



|D|=3 



3 


1 


-1 




1 


2 


1 


3 


" 3 


-5 


5 

3 


1 


3 


2 

~ 3 


4* 



Solution: 



Z ~ 4 7 



|D| =7. Optimal 



,6 
l- 



F: - (6, 1, 2) (2, 5, 3) 

(5, 2, 4) (1, 6, 5) 

(4, 3, 6) (0, 0, 0) 

(3, 4, 1) 





2 


5 


3 


Z = 


4 7 


7 


7 


* 


4 


7 


2 

7 




2 


3 


1 


X 2 = 


4 7 


" 7 


3 7 




2 


2 


3 


X 2 = 


1 7 


~ 7 


7 







i 


1* 


BI = 


~ 7 


7 


7 



z = 



|D I =2. Dual feasible 



- (0, 1, 1) 
(0, 0, 0) 



3 


1 


.3 




2 


2 


1 





1 


-5 


-2* 


11 





j. 


^ 




2 


2 




1 


7 


3 


2 


" ^ 
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-X 2 



z = 



|D | =4. Optimal 



F: 7 (3, 1, 1) 
(2, 2, 2) 
(1, 3, 3) 
(0, 0, 0) 



7. 

4 


1. 
4 


4 4 


1 





1 


2 ! 


" 4 


~ 4 


4 


1. 

4 


5 

4 


if 


1. 

4 


3 

4 


3 

4 


I* 

4 


^ 

" 4 



X 2 = 



1 


1 


4 


1 





1 


4 


2 


-5 


2 


1 


-1 


1 


-1 


-1 


3 


-4 


1 



1 -s, 

|D I = 1. Optimal 
Integer Matrix 

Solution: 

z=l x t = 1 x 2 = 2 

Example 3. 

This problem illustrates one way of doing these computations in 
integers throughout. Each ay is written as Ay/|D[ with Ay (the 
numerator) an integer. Since the common D is known, only the Ay are 
written in each array. The rules for pivot choice are the same as before 
and the Ay can be used in place of the ay as only ratios are involved. 
Pivoting on Ai Q ,j produces new Ay and a new D* as follows: 

l Dt | = |A io ,j | 



A i ,J 
A lj 



j "Jo 



the plus sign being used if Ai ,j is positive, the minus if it is negative. 
Since all 2 x 2 subdeterminants of the array of Ay are divisible by D, 
the division involved in getting Ay always produces an integer. 
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Maximize z = 
Subject to : 



+ 2x 2 + 3x 3 + x 4 + x 5 

+ 4x 3 + 2x 4 + x 5 < 41 
+ 3x 2 + x 3 - 4x 4 - x s < 47 



Primal Feasible 



X 2 



1 


~ x l 


~ X 2 


~ X 3 


-X 4 


~ X 5 





-1 


-2 


-3 


-1 


-1 


41 


1 





4 


2 


1* 


47 


4 


3 


1 


-4 


-1 



z = 



1 Xj X 2 X 



X 4 



41 





-2 


1 


1 


1 


41 


1 





4 


2 


1 


88 


5 


3* 


5 


-2 


1 



z = 



1 -x 
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10 


2 


13 


-1 


5 


123 


3 





12 


6* 


3 


88 


5 


1 


5 


-2 


1 



|D | =6. Optimal 



Solution: 



z = 106~ x t = x 2 = 43 

t 

x s = x 4 = 2o| x 5 = 



z = 



X 2 = 



-x 2 -x a -x 5 -x 



639 


21 


4 


30 


1 


11 


123 


3 





12 


3 


3 


258 


12 


2 


18 


2 


4 


-3 


-3 


-4 





-1* 


-5 



F: - (3, 3, 4, 0, 1, 5) (0, 0, 4, 0, 4, 2) 
(0, 0, 2, 0, 2, 4) (3, 3, 2, 0, 5, 1) 
(3, 3, 0, 0, 3, 3) (0, 0, 0, 0, 0, 0) 

Additional inequalities: 



- (6, 6, 2, 6, 2, 4) (6, 6, 4, 6, 4, 2) (6, 6, 6, 6, 6, 6) 
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1 -X! -x 2 -x 3 -Sj -Xi 



106 30511 

19 -1-223 -2 

42 1-132-1 

3 340-65 



JD | = 1. Optimal z = 

Integer Matrix x 4 = 

z = 106 x i = x 2 = 42 X2 = 

x 3 = x 4 = 19 x 5 = 3 Xs = 
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Integer Quadratic Programming t 



Hans P. Kunzi 
Werner Oettli 

1. INTRODUCTION 

A number of procedures for solving both quadratic programming prob- 
lems and integer programming problems now exist [1, 2, 3, 5]. Here we 
consider the combined problem. That is, we add to the ordinary constraints 
of a quadratic programming problem the requirement that the variables be 
integers. An approach to the general nonlinear -integral problem has been 
suggested by Kelly [4]. The procedure given here is designed to take ad- 
vantage of the special properties enjoyed by problems with a convex quad- 
ratic objective function and linear constraints. 

The problem in the form we shall consider it is: 

minimize Q(x) =p T x+ (l/2)x'Cx 
subject to Ax ^ b 

x 22: o 

x has integral components (1-1) 

In the above, p is a given n-vector, b is a given m -vector, x is an n-vector 
to be determined, C is an n by n positive definite matrix, and A is an m 
by n matrix. Transposes are denoted by primes. It is assumed that the 
set of all points x satisfying the constraints is bounded. 



2. THE GEOMETRICAL INTERPRETATION 

It is convenient to describe the process geometrically and illustrate it 
in two dimensions. Suppose that the constraint set is the (convex) poly- 
hedron shown in Fig. 1. The point x is the center of the family of ellip- 
soids, Q(x) = constant; that is the point at which Q(x) assumes its free 
(unconstrained) minimum. The problem is to find the smallest ellipsoid 
which passes through a lattice point lying in the polygon. 

This will be done by solving a sequence of mixed integer problems. We 
should caution, however, that the objective function in these mixed integer 



fWe are indebted to R. L. Graves for a number of improvements in the 
present version. 
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Fig. 1 

problems is not itself required to be an integer so that our procedure 
yields only approximate solutions with present techniques for mixed 
integer problems. 

To continue with the description, we first determine x^ the point at 
which the smallest ellipsoid, Q(x) = constant, just touches the polygon. 
Since C is positive definite, Q is strictly convex and this point is unique. 
A next natural step would be to dilate the ellipsoid, Q(x) = Qtej), relative to 
x until it first passes through an integer point of the polygonal domain. 
The iteration procedure we propose to adopt instead consists in the dilation 
not of this ellipsoid but rather of a polyhedral approximation of it. At step 
k, this polyhedron, P k , has k faces which are tangent to the ellipsoid. At 
the first step, P i has one face. The polyhedron, then, is dilated until it 
meets a lattice point of the given polygonal domain. The "stretching 
parameter," A., in terms of which the dilation is expressed, determines a 
unique ellipsoid, say Q(A.), which is, of course, tangent to the dilated poly- 
hedron, P k (\). Let xfc.i (at the first step this is x 2 ) be the lattice point en- 
countered upon dilating P k . Now x and x^ determine a line which meets 
the ellipsoid Q(x) = Q(XJ) in a point. Call this point x k+1 . We now construct 
the line (more generally a hyperplane), which is tangent to the ellipsoid at 
Xfc+i, and use this line to form a new approximating polyhedron, P k+1 , which 
has k + 1 faces. 

Now if xfc+i = xfc+i, then the problem has been solved. This is true be- 
cause the tangent polyhedron in which the ellipsoid is imbedded has passed 
through no lattice points in the dilation process, and hence the ellipsoid it- 
self certainly hasn't. Otherwise the new polyhedron, P k+1 , is properly con- 
tained in P k , We now repeat the process by dilating P k+1 to obtain a lattice 
point xk +2 . The procedure terminates in an optimal solution when a lattice 
point appears for the second time because the point can be reached only by 
the hyperplane which touched it before and this point lies on the dilated 
ellipsoid. 
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3. THE ALGEBRAIC TREATMENT 

Here we shall give a more formal presentation and exhibit explicitly the 
sequence of steps. The problem we wish to solve has the same solutions 
(the solution need not be unique) as the following problem. 

minimize \ 
subject to Ax ^b 

x 2: 

x integral 
t T (x - x ) ^ At' (x - x ) (3 -1) 

The last inequality is to hold for all x such that Q(x) = Q(y) where y is the 
(unique) solution to 

minimize Q(x) 
subject to Ax ^ b 
x 2:0 

The value of the t which is associated with an & is given by 
t = p + Cx 

To prove that a solution, x, to (1-1) is indeed a solution to (3-1), and to 
facilitate the exposition of the constructive procedure, we need the following 
sequence of lemmas. In them x denotes the free minimum of Q(x). We 
suppose it to be outside the constraint set. It is elementary to show that 
x =-C" 1 p. 

Lemma 1. t'(x - x ) = 2[Q(x) - Q(x )] 
Proof: t' (x - x ) = (p T + x r C)(x - x ) 

Now Cx = -p and Q(x ) = (l/2)p f x and these substitutions give the desired 
result. 

Definition: Given a vector x a , let 

Ma = KQ(y) - ( 

and 

x a =x 
Lemma 2. Q(x a ) = Q(y). (That is x a lies on the undilated ellipsoid.) 

Proof: Q(x a ) = Q((l ""Ma) x o + Ma x a> 
= (l-Ma)Q( x o) 
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Lemma 3. Given a vector x a , the problem 
maximize t f (x a - x ) 
subject to Q(x) = Q(y) 
has the solution x = x a . 

Proof: t' (x a -x ) = Q(x a ) + Q(x) - 2Q(x ) - (1/2) (x a -x)' C(x a ~x) 
The only variable part of this expression involves the quadratic form C. 
This form assumes a minimum value when & - x a , and hence its negative 
is a maximum there. 

This shows that for a proposed solution to (3-1), say x a , we need only 
verify one "t" inequality, namely for x =x a since the left side of the re- 
lation assumes its greatest value for this value of x and the right side is a 
constant. 

Lemma 4. Let xfc, M) ^ e a solution to (3-1). 
Then \b 



Proof: b - x o = Mb< x b " x o> 
- x ) = jitb4( x b "~ x ) 
~ x ) = A-btb^b ~~ x o) - 



The last equality is true because the left side assumes its largest value for 
x = xb anc * ^b * s a minimum . 

Theorem. Let a solution to (1-1) be denoted by x a . Let 

^a = [4( x a ~" Xo)]/[t a (x a - x )]. Then x a , A. a is a solution to (3-1). Conversely, 
let xb, \b be a solution to (3-1). Then xb is a solution to (1-1). 

Proof: Given x a and A. a , the hypothesis and lemma 3 yield 

t a (x a - x ) = A a ta(x a - xo) 

r(x a -x ) <\ a t f (x-x ) 

Hence x a , \ a is a solution to (3-1). Given Xb, Ab we must have \b = ^SL be- 
cause A a is a solution to (3-1). Using lemma 4, it is easy to show that 

Q(x b ) = (1 -X^) Q(XQ) 

A similar expression can be written for Q(x a ). Since Q(xb) = Q(y) and A.b 
= ^a> Q(xb) = Q(x a ). This completes the proof. 

Turning now to the constructive procedure, the first step is to find the 
location of the free minimum of Q(x). As indicated before this is 



Then x t is the solution to the quadratic programming problem 

minimize Q(x) = p T x + (l/2)x T Cx 
subject to Ax ^b 
x ^ 
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Then set x t =x t . 

We wish to determine a sequence of integer -valued x^ which satisfy 
Axfc ^ b and lie on the dilations of hyperplanes tangent to the ellipsoid, 
Q(x) = Q(xi), and an accompanying sequence of &k which lie on the ellipsoid 
and on the line joining xfc to x . 

The general step in the procedure can be expressed as follows. Given 
the points xj and &j for 1 < j < k, solve the mixed integer problem 

minimize X 
subject to Ax ^b 

t-(x - x ) ^ Xt](xj -x ); (j = 1, 2, . . . , k) 

x > 0, X ^ 

x integral, X arbitrary 

In this problem, tj is the gradient vector given by 

tj = p + Cxj 
If we let 

Ofj = tj(Xj -X ) 



Then the problem can be expressed more succinctly as 

minimize X 
subject to Ax =s b 
(tjx ~/3j)/aj ^ \; (j = 1, 2, . . . , k) 

x ^ 0, X ^ 

x integral, X arbitrary 

If the solution to this problem is denoted by x^, Xfc +1 , then xfc+i is given 
by 



x o 

-Q(x ))/(Q(xk + i) ~ 



Now if we find a vector, x m , and scalar, X m , satisfying merely the one 
additional "t" constraint 

tT m( x m ~ x o) - ^m tT m( x m - x o) 

then lemma 3 insures that we have a solution because all of the other "t" 
constraints will be satisfied. Such a solution will be available if at some 
stage x m = x k for m > k, because x m satisfies the earlier k to relation 
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t'ktem " x o) 



and x m = &k ^m = *k (ordinarily A. m > \k) There are only a finite number 
of integer points to consider, since the constraint set is bounded. Thus at 
some stage Xm = x^. The only reason for choosing x t as the solution to 
the noninteger problem is to ensure that an integral point in the constraint 
set will be found in the process of moving the tangent hyperplane from x t . 
It is perfectly possible to start with an integral point in the constraint set. 
It is worth observing that this procedure adds additional constraints or 
cutting planes just as ordinary integer programming does. (In the end only 
one of them is needed.) Thus the dual method can be used as it is in ordi- 
nary integer programming. 

There are serious approximation problems since this is a mixed integer 
problem with a nonintegral objective function. For the mixed integer finite - 
ness proofs to apply, it is necessary to replace the functional by NX (say 
N = 1000) and require that NX be an integer. Then the procedures of 
either Gomory or Dinkelbach can be applied. 
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ON DIAGONALIZATION METHODS IN INTEGER PROGRAMMING 

Richard Van Slyke and Roger Wets 

ABSTRACT 

An important area for improvement in existing integer programming 
codes is in the easy generation of efficient cutting hyperplanes; here we 
approach this problem using a triangular canonical form. First an algo- 
rithm is given based on Gomory's all-integer integer programming algo- 
rithm, which constitutes a first step in this direction. This procedure is a 
practical analog of a deepest cut method discussed in the second part. 
Finally, a brief outline and flow diagram for the algorithm are given and 
illustrated. 

We assume that we have at hand an integer program where all the co- 
efficients and constant terms are integers. The functional is to be maxi- 
mized. The problem can be written in a parametric form due to Tucker: 

Maximize x subject to Xj integer, 3 = 0, . . . , n; Xj ^ 0, 

j = 1, . . . , n; and 

x = b + C t + Cl t t + ...+ cktfc 

i = b t + a 10 t + a u ti + . . . + a lk t k 



x m = km + a mo t + a ml t! + . . . + a mk t k (1) 

Then a simple transformation is made to the following equivalent problem 

Maximize x where Xj ^ 0, (j = 1, . . . , n); and Xj are integers 
(j = 0, . . . , n) subject to 

x =b - t 

*i = *>i + a 10 t - tj . 

x 2 = bg + a 20 t + 



b n + anoto + am 1 ! + a n2t 2 + + a nk+1 t k+1 (2) 
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Because the columns of (2) are ordered lexicographically, ''efficient 
cuts" can be easily found. 

A simplified variant of Gomory's all-integer integer programming algo- 
rithm leads to the solution in a finite number of iterations. 



An Accelerated Euclidean Algorithm for Integer 

Linear Programming 



Glenn T. Martin 

THE PROBLEM 

Many optimization problems may be formulated as linear programming 
problems in which the solution variables must take on integer values. 
Development of linear programming into a routine mathematical tool 
has intensified the search for associated computational procedures to 
handle the integer programming problem. Until recently, however, prog- 
ress has been meager both in the area of mathematical theory and in the 
area of application. But in 1958, R. E. Gomory [1] proposed a rigorous 
solution procedure which was shown to converge in a finite number of 
steps. Later he proposed a modification of this method as an "all-integer 
algorithm" [2], Both these techniques make use of the simplex procedure 
and apply principles of the Euclidean Algorithm [3] . In each case a set of 
suitable "cutting planes" is applied to the system in such a way that non- 
integer solutions are occluded and the desired integer solution ultimately 
found. 

Both of Gomory 's techniques have met with limited success in applica- 
tion. Nonetheless, many computational difficulties have been encountered 
even on very small problems. Larger problems very modest by normal 
linear programming standards often persistently refuse to converge. It 
seems, therefore, that a means to accelerate convergence is necessary if 
reasonably complex problems are to be routinely solved. 

An Accelerated Euclidean Algorithm has been investigated as a means 
of reducing computational effort involved in solving integer linear pro- 
gramming problems. The procedure is a direct extension of Gomory' s 
original proposal. The technique will be illustrated and modifications 
to the earlier technique pointed out by use of a small example. 



Example Problem 
Minimize 

z = -2x t - 3x 2 

Subj ect to 
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2Xi + 5x 2 ^ 8 

3xj + 2x 2 < 9 (2) 

and 

x t , x 2 = 0, 1, 2, . . . (3) 

The variables x 3 and x 4 are added as slack variables and the array shown 
in Fig. 1. The identity columns are omitted and basis vectors are indicated 
by row identification. 

Procedure 

A. Apply the (composite) simplex algorithm to (1) and (2). If (3) is met 
the desired solution has been attained; otherwise 

B. Apply the Accelerated Euclidean Algorithm, steps Bl to B7, until an 
integer solution is attained, then reapply the simplex algorithm as in 
step A. Thus: 

1. Select a noninteger row vector, p, from the set. 

2. Abstract the "Gomory" restraint, s, from p. This is the positive 
fractional components of the elements of p. 

3. Determine k, the dual simplex pivot column for the row, s. 

4. Compute p f , the Gaussian transform of s on p only (contrast to 
earlier procedure which transformed the entire set). 

5. Continue to abstract S T restraints and transform p T until the k 
component of p r becomes integer. Note that the index, k, as de- 
termined in step B3 is retained throughout this process. A special 
algorithm has been devised for carrying out this step. (This also 
contrasts to the earlier procedure.) 

6. Using the original restraint, p, and its final form, p Tt , perform a 
"reverse inversion," i.e., a reverse Gaussian transformation, to 
find the "consolidated restraint," r, which generates p TT from p. 

7. Append r to the entire set and perform the indicated Gaussian 
transformation. Iterate this process beginning with step Bl until 
an all-integer solution is attained. Then return to the simplex 
algorithm (step A). 

Accordingly, in our example, the simplex algorithm [4, 5] is applied and 
the resulting optimal -feasible solution displayed in Fig. 2. D is the de- 

INITIAL MATRIX 
b ^ xj 

z 23 

X 3 8 2 5 |D|=1 
x 4 9 3 2 

Fig. 1 



z 

x l 

s 


-76/11 
6/11 
29/11 
-7/11 


-4/11 
-2/11 
5/11 
-5/11 


-5/11 
3/11 
-2/11 
-9/11* 
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SIMPLEX SOLUTION 1 



IP I = 



p = x t , k = x 3 
Fig. 2 

terminant of the matrix, the product of all pivot elements. Since the 
solution is not integer, a noninteger row, called p, having the largest 
fractional component in b is selected. The modulo one components are 
extracted and appended as s, however, the only use made of this row at 
this time is to select k, the simplex dual method pivot column [6]. 

Now p is subjected to accelerated reduction via the following algorithm. 
(Primes refer to the next subsequent stage of numbers.) 



a p =I p +a+/|D| 

a pk = a*/ |D | (definition) 



V = I p 

Since ( | D T | - a*)/ | D | = an integer, it is convenient to split the p com- 
ponents, a p , into integers, I p , and positive fractional components, a p / 1 D [, 
and operate simply on the latter, splitting out any integers generated and 
adding to the other integers. Finally, when ap k becomes integer the 
residual components are added to form the transformed row, p". This 
process is tabulated in Fig. 3. The p" is result of applying five Gomory 
restraint stages to p. 

ACCELERATED REDUCTION 1 

|D T | - a* (k) FRACTIONS INTEGERS 

D D; | DP 1 1 Xg b X4 b X4 

11 9 1/9 -2/9 7/9 5/9 2 

9 7 1/7 -2/7 7/7 5/7 2 

7 5 1/5 -2/5 05/5 30 

5 3 1/3 -2/3 00 31 

31 1/1-200 31 

b 2:4X3 
p" = 3 1 -2 

Fig. 3 
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Now we do a "reverse inversion" step; thus, 



a r = (ap-ap") a rk /ap k 



This operation is summarized in Fig. 4. The resulting restraint and 
its application to the system is shown in Fig. 5. Since this is integer, we 
return to the simplex procedure. 

The process of integer ization rendered the solution nonoptimal; there- 
fore, a new simplex solution is generated in Fig. 6. This is noninteger, so 
we select a p row, extract s and determine k (in this case we have de- 
liberately violated the criteria of selecting the largest fractional com- 
ponent of b) . 

Figure 7 shows the accelerated reduction, and Fig. 8 shows the reverse 
inversion step for the new restraint. Its addition to the system and the 
result is displayed in Fig. 9. Since this satisfies both the simplex and the 
integer requirements, the final solution has been obtained. 



REVERSE INVERSION 1 

(k) 

b X4 Xg 

p = 29/11 5/11 -2/11 

p" = 3 1 -2* 

p-p" =^4/n -6/11 

r * =-2/11 -3/11 -1/11 

Fig. 4 
APPLICATION OF r 1 

b X4 Xg 



z 

X 2 

X l 


-76/11 
6/11 
29/11 
- 2/11 


-4/11 
-2/11 
5/11 
-3/11 


-5/11 
3/11 
-2/11 
-1/11 



z -6 1 -5 

Z I -I 4 

x 3 2 3* -11 

Fig. 5 



-20/3 
2/3 
7/3 
2/3 
- 1/3 


-1/3 
1/3 

+1/3 
-2/3* 


- 4/3 
- 2/3 
5/3 
-11/3 
- 2/3 
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SIMPLEX SOLUTION 2 



x l 



p = x lf k = x 3 

Fig. 6 

ACCELERATED REDUCTION 2 

| D T | - a* FRACTIONS INTEGERS 

|D | |D_ f | | ED' | x 4 b s b s^ 

3 2 1/2 -1/2 1/2 2/2 2 1 

2 1 1/1 -1 1/1 22 



-1/2 


1/2 


2/2 


-1 


1/1 





p" = 3^ 


51 
2 





Fig. 7 
REVERSE INVERSION 2 









(k) 








b 


54 


51 




p 


= 7/3 


-1/3 


5/3 




P" 


= 3 


-1* 


2 


p 


-P" 


= -2/3 





-1/3 




r 


= -2/3 


-1/3 


-1/3 






Fig. 


8 





Reduction of Positive a* 

It is noted that in the accelerated reduction step, fractional components 
are always interpreted modulo one at each stage. The signs of 
( | D f | - a*) / 1 DD 1 1 and of fractions must be algebraically maintained. Any 
negative fractions are interpreted modulo one and the resulting negative 
integer added to the integer component. Fig. 10 illustrates this with 
positive a*. 



DISCUSSION 

It is evident that the strongest "cutting plane' ' available from a 
particular row with a particular pivot column is likely to represent a 
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APPLICATION OF r 2 



ID' 



2 
X 2 



b 


X4 


i 


-20/3 


-1/3 


- 4/3 


2/3 


1/3 


- 2/3 


7/3 


-1/3 


5/3 


2/3 


-1/3 


-11/3 


-2/3 


-1/3* 


- 1/3 









b 


51 


51 


-6 


-l 


-1 





i 


-1 


3 


-1 


2 





1 


-4 


2 


-3 


+ 1 




Fig. 9 





ACCELERATED REDUCTION WITH POSITIVE a* 

b 3q xg 

p = 11/4 1/4 7/4* 

|D T |- a* (k) FRACTIONS INTEGERS 

iDl |D_ f l |DD T | Xg b Xi b x^ 

4 3 -1/3 7/3 -3/3 -1/3 2 

2/3 1 -1 

3 1 -2/1 7/1 0-4 1-1 

b x i x 2 



p = 


11/4 


1/4 


7/4 


p M = 


1 


-5 


7* 


P P" = 


7/4 


21/4 





r = 


-1/4 


-3/4 


-1/4 



Fig. 10 

complex multiple-sum of the restraints immediately apparent in a non- 
integer system. Since the original procedures for choosing a restraint 
primarily examined only those fairly readily apparent, obviously the 
prospect of a "good" choice was quite remote. 

In selecting a noninteger row, we simply take the row containing the 
largest available fractional component of interest. It is not clear that this 
is necessary or even helpful with the Accelerated Euclidean Algorithm, 
i.e., a random choice may do as well, or better. We simply don't know. 

The procedure used here operates much like the previous technique, 
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however, (a) restraints are applied only to one row until the pivot column 
element becomes integer, with later application to the entire set, and 
(b) this process of applying restraints is allowed to destroy optimally as 
well as feasibility. Thus a composite simplex algorithm is essential to 
effective use of the new procedure. 

Computing Experience 

We have solved only a handful of problems via this algorithm. We can 
report however that for small problems which we easily solved by earlier 
methods, the new procedure has proved much more efficient in all cases. 
Perhaps 75 per cent of the " integer iz ing" iterations were typically saved 
(although this is not necessarily meaningful because of possible non- 
representative problems). More importantly, problems which we found 
impossible to handle with earlier procedures have generally yielded to 
the new technique provided digital difficulties could be avoided. In this 
category is a 54 x 442 system which is the largest problem we have 
attacked. 



CONCLUSION 

The Accelerated Euclidean Algorithm seems to offer computational ad- 
vantages over earlier Euclidean methods for integer linear programming. 
However, more comprehensive experience is needed before a thorough 
evaluation of this early promise can be made. 
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Flows In Networks 



D. R. Fulkerson 

This survey will summarize, in a very brief way, that part of linear 
programming theory encompassed by the phrase "transportation prob- 
lems" or "network flow problems." The latter name better describes the 
mathematical content of the subject and is less committed to one domain of 
application. This paper will not say too much about applications, but will 
instead stress some of the more important notions and theorems in this 
subject. 

Before getting into some of these concepts and theorems, a word should 
be said about the history of network flow problems. Just where the subject 
may properly be said to have started depends on how much latitude is 
allowed in interpreting the phrase "flows in networks." Certain static 
minimal cost transportation models were independently formulated and 
studied by Hitchcock [13], Kantorovitch [15, 16] and Koopmans [18] in the 
1940's. A few years later, when linear programming began to make itself 
known as an organized discipline, G. B. Dantzig showed how his simplex 
method could be simplified and made more effective for this class of prob- 
lems [1]. It would not be inaccurate to say that the subject really began 
with the work of these men on the very practical problem of transporting a 
commodity from certain points of supply to other points of demand in a way 
to minimize shipping cost. However, dismissing the formulational and 
applied aspects of the topic, and with the advantages of hindsight, one can 
go back a few years earlier to work of P. Hall on set representatives [12], 
or Konig, Egervary and Menger [17] on linear graphs, and relate this 
work in pure mathematics to the practically oriented subject of flows 
in networks also. One can even go further back to the Maxwell -Kir choff 
theory of electrical networks, although this is not a linear problem, and 
say the subject began there. Actually, the earliest reference I know of 
to work that can be regarded as in this area is a paper by Monge in 1781. 

So much for history. We now turn to some of the main concepts and 
theorems about network flows. 



BASIC CONCEPTS 

Figure 1 shows a network and introduces some notation. The six 
circles in the figure are called nodes. They are indexed by i, i running 
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, 6 

^ 

7y 




Nodes i , i = I , , n 

Arcs i j (from i to j) , i, j - I, , n 

Arc copocities c \ j > 

Fig. 1 

from 1 to n. In addition there are directed arcs, denoted by ordered 
pairs (i, j), to be interpreted "from i to j," i and j running from 1 to n. 
There are also arc capacities, nonnegative numbers denoted by c^ . For 
instance, the arc (1,2) has capacity 7; the arc (2,4) has capacity 6. Capac- 
ity simply means an upper bound on the amount of flow that can take 
place in an arc in the direction of its orientation in some steady state 
situation. For instance, 8 units per unit time can pass from 3 to 5. 

It turns out that it really doesn't make any difference, for most flow 
problems, whether oriented arcs or unoriented arcs are assumed. In 
some problems it does make a real difference, but we'll assume directed 
arcs uless otherwise stated. 

Figure 2 shows a flow through the network from the node 1 on the left 




3,3 



8,5 



Flow f from 1 (source) to n (sink) of volue v 

(v i = 1 
i=2,-,n-1 
-v i = n 

(2) 0<f u < Cij 

Problem : construct flow from source to sink of maximal value 

Fig. 2 
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8,7/ 

I SOUI 

sourc 
From (I) and (2), 



Cut separoting source and sink: partition L,L of 
nodes with source in L, sink in L 



i in L 
j in L 



iinL 
j in L . 



Fig. 3 

to node 6 on the right. This flow is an assignment of numbers to the 
arcs of the network here shown by the second number on each arc such 
that for any intermediate node (here, nodes 2, 3, 4, 5), conservation 
holds, that is, flow-in equals flow-out. For example, at node 2 there are 
4 units coming in and 3+1 = 4 units going out. Because of the conservation 
condition at nodes 2, 3, 4 and 5, the net flow of 7 units out of node 1 is 
equal to the net flow of 7 units into node 6. In general, Eqs. (1) describe a 
flow f = (fy) from source to sink of value v, that is, v units get through 
the network. In addition there are capacity constraints and nonnegativity 
requirements (2). 

Given this formulation, a very natural problem that suggests itself is: 
Push as much as possible through the network. That is, maximize the 
variable v subject to the equations (1) and inequalities (2). This is probably 
the most fundamental problem about flows in networks, and we shall 
state some of the basic theorems about this problem. But in order to do 
this, the notions illustrated in Figs. 3 and 4 are needed. 

Figure 3 introduces the notion of a cut in a network. The nodes of the 
network are split into two sets L and IT, one of which contains the source 
and the other the sink. This division is called a cut separating source and 
sink. If the flow equations are added up over the nodes in the source-set of 
a cut, then the inequalities (2) yield the result shown in Fig. 3. The value 
v of a flow f is equal to the net flow across any cut, and is hence bounded 
above by the capacity of the cut. We shall see in a moment that equality 
holds (i.e., the upper bound is achieved) for some flow and some cut. 

Figure 4 illustrates one way of increasing the value of a flow; namely, 
by using what might be called a flow -augment ing path, shown by the dotted 
arcs, which form a path from source to sink. Some arcs are traversed 
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7,4 + 




with their orientation, some against their orientation, in going from 
source to sink. In order for a path to be flow -augment ing, we want the 
property that for the forward arcs (those traversed with their orientation), 
the arc flow is less than capacity, whereas for the reverse arcs of the 
path (those traversed against their orientation), the arc flow is positive. 
Thus if the flow is changed by adding e > to the flow in forward arcs of 
the path and subtracting e from the flow in reverse arcs, a new flow is ob- 
tained whose value is units greater than the old flow. The largest value 
for in Fig. 4 is e = 2 (the bind coming on the reverse arc), so the new 
flow has value 9. it turns out that it suffices to look for flow augmenting 
paths in order to maximize flow through a network [3]. 

Figure 5 shows the flow that is obtained if the change e = 2 is made 
along the flow -augmenting path in Fig. 4. Now observe the cut shown by 
the wavy line. Every arc that goes from source-side to sink-side in this 
cut is carrying flow at capacity. On the other hand, any arc that comes 
back across the cut in the wrong way carries no flow. Hence equality 
holds in the inequality of Fig. 3, and this flow is consequently maximal, 
while the cut is minimal. This illustrates the most fundamental theorem 
about maximal flow. 



6,5 




8,7 
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THEOREMS ON MAXIMAL FLOWS 

There are several important theorems about maximal flow. The first, 
shown in Fig. 6, says that the situation illustrated in Fig. 5 is general. 

One of the several proofs [3] of this theorem yields the other two 
theorems of Fig. 6 as corollaries. The content of the construction theorem 
is: To force more through a network, it suffices to look for a path that 
augments the present flow. To make a real construction out of this, some 
good way of searching for a flow -augmenting path is needed. There are 
such ways, combinatorial in nature, that are quite good computationally. 

The integrity theorem follows from the construction theorem, since if 
all arc capacities are integers, and the initial flow is integral, then the 
flow change e made along a path will be an integer, yielding a new integral 
flow. Of course, more fundamentally, the integrity theorem follows from 
the fact that the constraint matrix of this linear program has the total uni- 
modularity property: Every subdeterminant has value +1, 1, or 0. But it 
is not necessary to look at determinants in order to prove the theorem. It 
drops out of the proof of the max -flow min-cut theorem, as does the con- 
struction theorem. 



MULTI-COMMODITY FLOWS 

We've been talking about a flow of a single commodity from source to 
sink. A multi -commodity flow problem [2, 4] is illustrated in Fig. 7. 
There are several sources, indicated by the s's, and several sinks, indi- 
cated by the t's, with a pairing between sources and sinks. Source 1 ships 
to sink 1; source 2 ships to sink 2; source 3 ships to sink 3, but the three 
simultaneous flows share capacities on arcs. This problem doesn't have 
the nice simple features that the single -commodity problem has. A hint 
that this is so can be gotten by looking at the examples in Fig. 7. Suppose 
each arc has unit capacity in the left network, for instance, and it is de- 
sired to force as much as possible through the network in this multi- 
commodity fashion. With integer flows, the best one can do is a flow of 

Max-flow min-cut theorem: for any network, the maximal 
flow value from source to sink is equal to the minimal 
cut capacity of all cuts separating source and sink 

Construction theorem: a flow f is maximal if and only i-f 
there is no f-augmenting path 

Integrity theorem : if all arc capacities are integers, 
there is an integral maximal flow 

Fig. 6 
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MULTI-COMMODITY FLOW PROBLEMS 




Fig. 7 

value 1, since a 1-unit flow of each commodity blocks flow for the other 
two commodities. On the other hand, using fractions, one can send a half- 
unit of each commodity, giving a total How of 3/2. The right network has 
the same features as the left, but for the undirected case. 



SOME COMBINATORIAL THEOREMS 

Consider the single -commodity problem shown in Fig. 8. Suppose sup- 
plies of a commodity are available at certain points in a network, and de- 
mands are made at other points. In the network shown (ignoring the dotted 



FEASIBILITY OF SUPPLIES AND DEMANDS 
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arcs), the supplies are 7 units at node 1, 2 units at node 2, and the de- 
mands are 1 unit at node 7, 8 units at node 8. When can the demands be 
satisfied from the supplies ? This question was asked and answered by Gale 
several years ago [10] . The idea is to convert this problem to a maximal 
flow problem by adding a first fictitious node the source node and 
another fictitious node the sink node and putting in the dotted arcs as 
indicated. Now interpret the supplies and demands as capacities of these 
arcs in the obvious way, and ask: Can 9 units be moved from source to 
sink? If they can, clearly both sink arcs are saturated and the demands are 
satisfied. So the question can always be answered by solving a maximal 
flow problem. If this analysis is extended a bit further, an interesting 
theorem drops out of this situation, as illustrated in Fig. 9. 

The necessity of the condition in the supply -demand theorem is ob- 
vious and there's no interest there. The sufficiency is not obvious and is 
quite interesting. To paraphrase iti Select any subset of the demand nodes 
and ask whether enough can be put into that subset to meet the sum of the 
demands over the subset without worrying where the flows go individually. 
If this can be done for all subsets of demand nodes, then the supply- 
demand constraints are feasible. This is a nice generalization of a well- 
known theorem in combinatorial mathematics due to P. Hall, which has to 
do with systems of distinct representatives for sets [1 2] . This theorem is 
stated at the bottom of the figure. To interpret it a different way, one might 
think of the sets as being jobs, and the elements as men, with the object 
being to assign men to jobs. The men are qualified only for certain jobs. 
For example, man 1 is qualified for jobs a, b and c, etc. Then ask if it is 
possible to supply men to these jobs, one man to each job and no man 
doing more than one job, when given such a configuration . Hall's theorem 

Supply-demand theorem: the supply-demand constraints (2) and 
(3) are feasible if and only if, for every subset T' c T 
of demand nodes, there is a flow (depending on T 7 ) that 
meets the aggregate demand over T' without violating the 
supply limitations at nodes of S 



Distinct representatives theorem: given a 
family of n subsets of some set, there 
is a system of distinct representatives 
for this family if and only if every k 
of the subsets collectively contain at 
least k distinct elements, k = l, ,n 



Fig. 9 
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Existence theorem for (0,1) -matrix having specified row and 
column sums. There is an m by n (0,1 ) -matrix having row 
sums a- t , I = !,*-, m,and column sums bj , j = I, , n , if 
k k 



b. < 



and only if 

Here ^ > b 2 > - > b n 

Fig. 10 

says: Take any subset of the jobs (such as indicated by the dotted oval in 
the figure) and examine all of the men who are qualified for some job in 
that subset; if there are enough men (that is, k men if k jobs were 
singled out) who are qualified for some job in the subset, and if this is 
true for all subsets of jobs, then one can indeed assign men to jobs. Here 
it is not possible because the dotted subset of these jobs leads back to 
only two men. Again, of course, the necessity is obvious. The sufficiency 
is the interesting part. 

To prove Hall's theorem from the supply-demand theorem, the 
integrity theorem mentioned previously is needed in order to single out 
an integral flow. That's one of the uses of the integrity theorem; it pro- 
vides a flow approach to many combinatorial problems. 

Another combinatorial problem that can be solved using flows is illus- 
trated in Fig. 10. It may not be the easiest way to solve it, but it is a 
mechanical way. That is one of the advantages of using flows on such a 
problem: If the problem can be formulated in terms of flows, then little 
subsequent imagination is required. The problem here is to construct a 
(0,l)-matrix having stipulated row and column sums. Of course, if just a 
nonnegative matrix is required rather than a (0,1) -matrix, the conditions 
are simple: the total of the row sums must equal the total of the column 
sums. But for a (0,1) -matrix, the situation is not that simple. The 
theorem, due to Ryser [21] and Gale [10], says the following. First, 
arrange the columns in monotone decreasing order. Now take the row 
sums and represent them by dots placed as far to the left as possible. 
Count the dots in the columns: 6, 6, 3, 3, 2, 2, 0. This sequence is con- 
jugate to the row-sum sequence, that is, the two sequences are conjugate 
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Circulotion theorem: there is a flow f satisfying the constraints 
(fij-fji) =0, all no'des i, 



< It] < fij < c^, all arcs ij , 
if and only if, for every subset X of nodes, 

I llj < I Cij 
iinX iinX 

jinX jinX 

Fig. 11 

partitions of the same integer. In the figure, the conjugate sequence is 
denoted by a*. Such a (0,1) -matrix exists if and only if the partial sums of 
the column-sum sequence are dominated by the partial sums of the row- 
conjugate sequence. Here there is a (0,1) -matrix having the given row and 
column sums. 

There is a very simple rule for constructing such a matrix. Simply 
take any column and put its 1's in the rows having the biggest sums, 



MINIMAL COST FLOWS 

Problem : given capacities 
and unit costs Oij, 
construct a flow from 
source to sink of value 
v that minimizes the 
flow cost G ij f ij 

Construction theorem : let f be a minimal cost flow of value v. 
Then the flow f' obtained from f by adding C >0 to the 
flow in forward arcs of a minimal cost f-augmenting path, 
and subtracting from the flow in reverse arcs of this 
path, is a minimal cost flow of value v + e 




Fig. 12 
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delete the column, reduce the appropriate row sums, and repeat the 
procedure in the reduced problem. This rule will construct such a matrix 
if there is one, and will lead to trouble otherwise. If this rule is applied 
by first selecting the column having smallest sum, then next smallest, and 
so on, the resulting matrix has some rather remarkable properties [9], 

Figure 11 shows another interesting feasibility theorem due to Hoffman 
[14], which is concerned not with flows from source to sink but rather with 
circulations. Assume flow in equals flow out at every node, and put a 
lower bound on flow in each arc, denoted by ly here, as well as an upper 
bound, and ask: When can you satisfy these constraints ? The resulting 
theorem is one of Hall -type. The constraints are feasible if and only if, for 
every subset X of nodes, the sum of the lower bounds on arcs pointing into 
X does not exceed the sum of capacities on arcs pointing out of X. Again 
this theorem can be proved using the max -flow min-cut theorem. 

It should be remarked that the integrity theorem holds for the feasibility 
situations that have been presented; that is, if the given data are integers, 
and if there is a feasible flow, then there is an integral feasible flow. 



MINIMAL COST FLOWS 

Another important problem concerning single-commodity flows is the 
minimal cost flow problem. In Fig. 12, a network is given with arc 
capacities and a source and sink. Also a cost per unit flow for each arc is 
given. (The second number of each pair in the figure is the cost, the first 
number is the capacity, and the circled number the flow.) Then ask: How 
can v units be sent from source to sink at minimal cost ? (The figure 
shows a minimal cost flow with v = 5.) Perhaps the most basic theorem 
about this problem is the construction theorem stated in the figure . Take 
f, which is assumed to be a minimal cost flow of a certain value. Look for 
a flow augmenting path with respect to f that has the least path cost of all 
flow -augment ing paths with respect to f. The path cost here (the dotted 
path) is obtained by summing the costs for forward arcs, and subtracting 
the sum of costs for reverse arcs. The path shown has cost 10. If you 
alter the flow along a minimal cost flow -augment ing path, then the new 
flow is a minimal cost flow corresponding to the new flow value. So to 
solve minimal cost flow problems, all that is needed, assuming the 
process has been started, is a routine for searching out a minimal cost 
flow -augmenting path with respect to a given flow. There are very 
efficient combinatorial methods for doing this. And starting the process 
is no problem if all arc costs are nonnegative. 

There are many applications of minimal cost flows. Conspicuous 
among them are the Hitchcock problem and PERT -scheduling problems. 
Another application to maximal dynamic flows is described in [6] . Sup- 
pose a network is given where each arc of the network has a capacity and 
also has a traversal time. The object is to send as much through the net- 
work as possible in a given time interval. If capacities are constant over 
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time, there's a very simple way of doing this as a minimal cost flow 
problem. 

MULTITERMINAL FLOWS 

Figures 13 and 14 get into another area that has been explored pri- 
marily by Gomory and Hu [11]. Here the concern is with multiterminal 
flows; this to be distinguished from the multicommodity flows mentioned 
earlier, in the sense that, while attention will be focused on many flows, 

MULTI-TERMINAL FLOWS 

3 

V3>1 

s.6 




Problems: 

(a) Let v r denote the maximal flow value between i and j. 

Determine the flow value function v = (vij) efficiently. 

(b) What are conditions on a given v=(v ij ) in order that 

it come from a network? 

Fig. 13 



(a) 




(b) 



A symmetric, nonnegative v (v^-) is realizable as the 
flow value function of an undirected network 
if and only if the "triangle" inequality 
Vg > min (v ik , v kj ) 

holds for all i, j , k 

Fig. 14 
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we shall not be dealing with simultaneous flows. Instead, questions are 
asked about all pairs of nodes taken as source and sink for a given net- 
work. The results here are for undirected networks, and are very pretty. 

Gomory and Hu considered several problems. One is this: Suppose we 
let vy denote the maximal flow value between nodes i and j . How do you 
determine the flow value function v in an efficient manner ? Clearly, 
v = (vy) can be determined by solving n(n - l)/2 single -terminal flow 
problems, but it is possible to do a whole lot better. 

The second problem is this: Given v = (vy), when is v the flow value 
function of some network ? 

Let's look at (b) first. It says: A symmetric, nonnegative v is 
realizable as the flow value function of an undirected network if and only 
if a kind of "triangle' ' inequality holds: For any triple of nodes i, j and k, 
then vy must be greater than or equal to the minimum of (V&, v^j). This 
triangle inequality puts very severe limitations on functions v that are 
realizable. Among other things, it implies that, numerically, v can take 
on at most (n - 1) distinct values if n is the number of nodes in the 
network. 

We turn now to (a) . The intent here is that any network is flow- 
equivalent to a tree. An equivalent tree for the example is shown. For 
instance, suppose you ask yourself: What's the maximal flow between 
1 and 4 in this network ? Go to the equivalent tree and proceed from 
1 to 4 by the unique path joining them, and take the minimum of the 
numbers you encounter here 6 and that's the maximal flow value. 
The number 6 is the capacity of the cut separating 1 and 3 from 2, 4, 5, 6 
(or the cut separating 1, 2, 3, 5 from 4 and 6) in the original network. 
Thus there are only (n 1 ) cuts that are relevant in solving the multi- 
terminal maximal flow problem for all pairs of nodes, and each of these 
is represented by an arc of the equivalent tree. Moreover, such a cut- 
tree can be constructed by solving precisely (n - 1) single -terminal 
maximal flow problems, as described in [11]. 

This survey of basic concepts and results about flows in networks has 
necessarily omitted everything in the way of detail and also has not even 
mentioned much substantial work in this field, I hope that it has at least 
imparted some knowledge and feeling for the subject. 
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MULTI-COMMODITY NETWORK FLOWSt 



T. C. Hu 

ABSTRACT 

A network is a set of nodes Nj connected by branches with nonnegative 
branch capacities by which indicates the maximum amount of flow that can 
pass through the branch from NI to Nj. Given all by, there is a maximum 
flow f(i;j) from Nj to Nj using all branches. The max flow min cut theorem 
of Ford and Fulkerson [1] is to find the maximum flow of one commodity. 
The present paper deals with simultaneous flows of two commodities in a 
network. 

Let f(k;k r ) be the value of the k th flow from Nfc to Ntf . Let c(k;k') be the 
capacity of a minimum cut separating N^ and Nk T ; c(l - 2; l r - 2 1 ) be the 
capacity of a minimum cut with N lf N 2 in one component and N 1 r,N 2 t in the 
other component; c(l - 2' ; l f - 2) be the capacity of a minimum cut with 
N!,N2 f in one component and N^ ,N 2 in the other component. 

Under the assumption that by = bji, the two flows are feasible if and 
only if 



f(2;2 T ) <c(2;2 T ) 

f(l;l f ) + f(2;2 f ) ^ min [c(l -2; l' - 2 T ), c(l -2 T ; 1 T -2)] 
and 

maxf(l;l T ) + f(2;2 T ) = min [c(l - 2; 1' -2'), c(l -2 T ; 1 T -2)]. 
An algorithm similar to the labeling method for constructing the two 
flows is obtained. 
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TRANSPORTATION PROBLEMS WITH DISTRIBUTED LOAM 

M. D. Mcllroy 

ABSTRACT 

Let F be a distribution of load over a space X; let d(x - x ) be the cost 
of servicing a load at x from a source at x ; and let x i? i = 1, 2, . . . , n, be 
given locations of n sources. We seek an optimal assignment of load to 
sources, according to distributions $[, minimizing 



i X 

subject to the constraints that all loads in any set S are served 



and that the capacity of source x^ is limited by 

f 
J 



An optimal solution consists in a set of exclusive and exhaustive regions 
Ri which form the supports of $i. Points on the boundary between Rj and 
Rj satisfy 

d(x Xj) - d(x -Xj) = const 

(In particular, with cost proportional to distance, each source serves a 
simply connected region whose boundaries are hyperbolas.) 

The regions are characterized by linear programming dual "potentials," 
YI, which determine the boundary loci according to 

d(x-xi) -d(x- Xj ) =Vi-Vj 

In terms of potentials, the problem of finding the n distributions $1 is 
equivalent to finding n numbers Vi (from which regions RI follow by the 
preceding formula) such that 
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f dF === Mi 
J *i 

with the complementary slackness condition that strict inequality in the 
first formula is accompanied by equality in the second. 

Due to their geometrical simplicity, distributed transportation problems 
often admit rough-and-ready solution techniques. Such methods based on 
our duality theory are discussed. 



LEAST COST ESTIMATING AND SCHEDULING 
WITH LIMITED RESOURCES 



C. F. Fey 

ABSTRACT 

J. E. Kelley and F. D. Fulkerson developed algorithms for determining 
least cost project schedules for projects composed of many activities which 
must be performed in certain sequences. They assume that unlimited re- 
sources are available. 

In practice, only a limited amount of scarce resources is available at 
any given moment. This paper describes an algorithm for generating least 
cost project schedules when the available resources are limited. An algo- 
rithm for one limited resource is developed in detail. 

The algorithm requires a project defined by a network of activities, the 
cost-time relationship for each activity, and the amount of scarce resources 
available at any time. Given that the project must be completed in a certain 
time, schedules are determined which minimize the project cost under the 
constraints imposed by the scarce resource. A series of these minimum 
cost schedules is generated, each differing from its predecessor by the 
time allotted to the project. The set of schedules spans all feasible project 
durations . 
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MATHEMATICAL PROGRAMMING SOLUTION OF TRAVELING 

SALESMAN EXAMPLES 



Frederick Bock 

ABSTRACT 

A mathematical problem which he designated the messenger problem 
(Botenproblem) was stated by Karl Menger [1] on February 5, 1930, at a 
mathematical colloquium in Vienna as follows (in translation) : 

We designate as the Messenger Problem (since this problem is en- 
countered by every postal messenger, as well as by many travelers) 
the task of finding, for a finite number of points whose pairwise dis- 
tances are known, the shortest path connecting the points. This prob- 
lem is naturally always solvable by making a finite number of trials. 
Rules are not known which would reduce the number of trials below 
the number of permutations of the given points. The rule, that one 
should first go from the starting point to the nearest point, then to 
the point nearest this, etc., does not in general result in the shortest 
path. 

Renamed the traveling salesman problem, this problem in various versions 
has received much attention in recent years because of both theoretical 
interest and practical importance. However, the methods so far proposed 
lack in power and elegance as compared with algorithms for related prob- 
lems such as the assignment problem and the minimum tree problem. 

Mathematical programming solutions have been obtained for more than 
30 examples of the following traveling salesman problem, including all 
examples found in the literature: Given an n x n matrix of nonnegative 
integers cy (some of which may be arbitrarily large), find values xy that 
minimize 



and satisfy 

1. Primary constraints 
a. Matrix constraint 



ijS 



b. Line constraints 
i. Row constraints 
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xy ^ 1 = b t i S (3) 

JeS 

ii. Column constraints 

E x ij^ 1=b j jS (4) 

iS 

c. Submatrix constraints 

x ^n k -l=b k kQ (5) 



d. Boolean constraints 

Xij{0,l} ijS (6) 

2. Secondary constraint 

E a ij x ij -^ (7) 

1,JS 

S denotes the set of matrix indices, {l, 2, . . . , n}. S k denotes a proper 
subset of S having n k elements. Q is the set of indices over which k ranges, 
i.e., {1, 2, . . ., (2 n -2)}. The secondary constraint (7) is a linear inequality 
implied by the nonlinear (Boolean) constraints (6) in conjunction with the 
linear constraints (2) -(5). A secondary constraint is required in some 
examples to eliminate fractional solutions in the x^ that satisfy all primary 
linear constraints and yield a smaller value of z than does any integer 
solution. Nonnegative integer values of the ay and b 2 of the secondary 
constraint are developed as necessary in the solution of particular examples; 
initially they are zero valued. 

The dual problem is: For values of bj, ty, bj, b k , fy, and ay as defined 
above in connection with an n x n matrix of data cy, find values u t , u^, Uj, 
u k , and u 2 that maximize 

w = uibj + E u i b i + S u j b j + E u k b k + *2b2 (8) 

ieS jeS keQ 

and satisfy 

u t + Ui + Uj + a ijk u k + aijU 2 ^ cy ij S (9) 



The coefficients ay k have the value 1 if i,j S k and otherwise 0. Add 
auxiliary integer variables y lf y i? yj, y k , y 2 , and vy to the left sides of 
constraints (2), (3), (4), (5), (7), and (9) respectively to measure infeas- 
bility and slack and to convert inequalities to equations. Restrict u t to be 
a nonnegative integer, uj, Uj, u k , and u 2 to be nonpositive integers. In any 
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optimum feasible solution yj = y^ = y* = y 2 = 0, yk 0, vy ^ 0, uy = vx = 
for all subscripts, and z = w. 

The method developed for this problem is an extension of the Hungarian 
method for the assignment problem [2, 3]; the latter is defined by (l)-(6) 
above with the omission of submatrix constraints (5). The present method 
has three phases and throughout shares these characteristics with the 
Hungarian method: it is dual feasible, primal feasible in the Boolean con- 
straints, and all integer; at each iteration there is exhaustive search, 
limited to those cells ij with vy = 0, resulting in a maximal set of xy = 1 
compatible with respect to the constraints being enforced, and a minimal 
cover defined by unit changes in the u's, except u lf with 2x - n = ZAu. The 
cover is then applied with maximum multiplicity, permitting a correspond- 
ing increase in u t and a strict increase in w. The search and revaluation 
are iterated until demonstration of infeasibility, attainment of the desired 
solution, or (as at present) termination of the phase. 

Phase 1. By (5) make cy arbitrarily large for i = j. Comply with (3), 
(4), (6). Find the minimum assignment by the Hungarian method. Covers 
consist of lines only. Terminate with (2) -(6), but not (5), satisfied. At 
most, nonzero weights u l9 n 1 of u^, and n 1 of Uj are necessary. 

Phase 2. Change one xy from 1 to in each subcycle resulting from 
phase 1. Comply with (3) -(6). Covers consist of lines and submatrices; 
the latter may occur in a negative sense (relaxation of u^) as well as in a 
positive sense. Weighted submatrices are always pairwise disjoint or con- 
tained one in the other. Terminate with (3) -(6) but not (2) satisfied. The lat- 
ter cannot be satisfied because there is no minimal cover consisting of 
lines and submatrices and there is a fractional solution better than any 
integer solution. At most, nonzero weights 14, n 1 of u^, n 1 of a,-, and 
n 3 of u^ are necessary. 

Phase 3. Form increments to the secondary constraint (7) by making 
Aay the sum of (1 if vy = 0) plus (1 for each submatrix k such that ay k 
= 1 and ufc < 0), and Afy the sum of xy for i,j S plus the sum of b^ for 
k Q and u^ < 0. Comply with (3) -(7). Covers consist of increments to the 
secondary constraint together with relaxation of nonzero ufc. At most, 
3n - 3 nonzero weights in all are necessary, i.e., u ls n - 1 of uj, n - 1 of 
Uj, n 3 of u k , and u 2 . 
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NETWORK ALGORITHMS FOR COMBINATORIAL AND DISCRETE 

VARIABLE OPTIMIZATION PROBLEMS 



John W. Suurballe 

ABSTRACT 

A mathematical technique which shows considerable potential for repre- 
senting and solving discrete variable problems uses the concept of a 
directed network and the shortest path through it. The many applications 
of this method are not yet widely known; it is the purpose of this paper to 
present some results in shortest and K^ shortest route algorithms, and 
illustrate their application by obtaining algorithms for several well-known 
optimization problems of current interest in industry and operations re- 
search. These are the traveling salesman, assignment, and more complex, 
related problems; a combinatorial wiring problem; a job -shop scheduling 
problem; and problems of system construction for maximum economy. 

Informally, the "shortest route in a network' ' problem is as follows: 
There is given a collection of nodes, and directed branches between pairs 
of nodes, forming a network or maze. In some way appropriate to the prob- 
lem, an origin node A and destination node B are given, and we are con- 
cerned with the various paths from A to B, observing always the "one- 
way" rule specified by the arrows. Each branch in the network has a dis- 
tance associated with it, and the distance along a path from A to B is the 
sum of distances of all branches used in that path. The shortest route 
problem is one of finding (efficiently) the path or paths from A to B with 
minimal distance. 

Historically, the idea and method of finding the shortest route in a net- 
work was first presented as an application of linear programming to dis- 
crete extremem problems by G. B. Dantzig in 1956. Since Dantzig's paper, 
several more efficient algorithms have been developed for finding shortest 
routes . 

Our network representations of the above combinatorial problems can 
be divided into two types. In the first type we have an "exact" representa- 
tionexact in the sense that every path in the network is one of the accept- 
able alternatives in the problem, every alternative is a path in the network, 
and the sum of distances along a network path is exactly the cost associated 
with the corresponding alternative. 

Using a node -ordering property of our particular networks, a shortest 
route algorithm is developed and applied to obtain the following results: 

1. Network algorithms for the wiring problem, and the assignment 
problem, using an N-cube model. 
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2. The optimum location of relay points (or supply points) in a given 
discrete set of potential locations, for a communication (supply) 
system with interrelated choice constraints. 

The network algorithms result in neat tabular arrangements in which 
addition, subtraction, and comparison are the only numerical operations. 
In the second of our two types of network representation, a problem is 
not represented exactly but is imbedded in a network and represented 
inexactly in the sense that all acceptable alternatives are network paths, 
but not all network paths are acceptable alternatives. In these problems 
the shortest route algorithms must' include additional constraints which 
automatically rejects unwanted network paths when they turn out to be 
optimal. To solve this problem generally, some new results in shortest 
route algorithms are developed. This material is applied, along with cer- 
tain network representations, to get the following results: 

1. Algorithm for K th Shortest Routes in a Network. 

2. Network algorithms for the traveling salesman, assignment, and 
related problems. The traveling salesman algorithm provides a neat 
example and is given in detail. 

3. Network algorithms for the job-shop scheduling problem, allowing 
set-up times for both jobs and machines, which depend in general on 
the order of operation. This application is more difficult, and only 
sketched. 

Some refinements and general comments on the network algorithms are 
given. 
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Farkas' theorem, 1, 13 
Fat formulation, 104 
Feasibility, 104 

closed, 31-33 

open, 29-31 
Feasibility theorem, 27 
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Feasible solution, 207, 286 

basic, 19, 150, 181 

dual, 207, 286 

permanently, 104 
Flows, maximal, 321-323, 325 

minimal cost, 328 

multi-commodity, 323-324, 333 

multi -terminal, 329 

network, 319-333 
Free-energy function, 255-262 



Gauss -Jordan elimination, 1, 3, 18, 

87, 101, 285 

Generalized inverse, 37 
Global solution, 67 
Gradient, 68, 108 
direct differential method, 70-71 
Lagrangian differential method, 

71-72 
projected, 73-76, 87, 161, 

171-173 

interior, 37 

reduced, 76-77 



Homogeneous linear programs, 13 
Hungarian method, 26, 341 



Integer programming, 269-318 
Inventory problem, 105, 111 
Inverse, of the basis, 125, 209 

generalized, 37 

product form, 125, 147, 191 
211-218 



Jacobian, 166 



Kuhn- Tucker conditions, 55-61, 73, 
75, 111, 161-167 



Lexicographic order, 20, 92, 206- 

207, 269, 284-287 
Local solution, 67 
optimum, 93 

Markov process, 113, 237 
Matrices, elementary, 127, 130 

permutation, 41-42, 132, 136 

row operations on, 130 

triangular, 126-128, 133 
Maximal flows, 321-323, 325 
Maximum transform, 35 
Minimal cost flows, 328 
Module, 272-275, 278-281 
Multi-commodity flows, 323-324, 

333 
Multi-stage problems, 125-148, 

239-240 

Multi-terminal flows, 329 
Multiple solutions, 14-15 
Multiplex method, 87 
Multipliers, Lagrange, 101-102, 
109, 260 

Network, algorithms, 343-349 

cuts, 321, 333 

shortest path, 343-344 

sources and sinks, 321-322 
Network flows, 319-333 

(See also Flows) 
Nonbasic variables, 17 
Nonlinear programming, 55, 67-86, 
89, 123 



Objective function, stochastic, 103- 

104 

Open feasibility, 29-31 
Optimal solution, 19 
Orthogonal solutions, 2 
Orthogonal subspaces, 2 
Orthogonality, 65 
"Out of kilter" method, 26 



Lagrange multipliers, 101-102, 
109, 260 



Parametric linear programming, 
149, 152-153, 201-210 
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Partition programming, 174-175 

Permanently feasible solution, 104 

Permutation, 5, 8 

Permutation matrix, 41-42, 132, 136 

Personnel assignment problem, 267 

PERT, 117-118, 328 

Phase I, 181-186 

Phase H, 181, 186-189 

Pivot column, 146-148, 181 

Pivot entry, 23, 212 

Pivot row, 146-148, 181 

Pivot step, 4, 17-18, 139 

Polar set, 28 

Positive definite, 101, 303 

Positive semi-definite, 63, 72, 101 

Primal-dual algorithm, 24 

Problems, critical path, 116-118, 

337 

inventory, 105, 111 
personnel assignment, 267 
transportation, 319, 335-336 
traveling salesman, 339-341, 343 
two-stage, 105-108, 115 

Projected gradient (See Gradient) 

Pseudo-basic variable, 133-148 

Quadratic function, 101 
Quadratic programming, 55, 104, 

111, 121, 224, 303-308 
simplex methods, 72-73 
Quadratic programs, 63, 65, 68 

Reduced cost, 77, 93, 181 
Reduced gradient, 76-77 
Risk, 103, 104 
(See also Uncertainty) 

SCEMP, 177, 211 

Self-dual, 63 

Separable functions, 79, 89, 111 

Separable programming, 77-80 

local, 89-100 
Simplex method (algorithm), 1,9, 

12, 17-26, 72-73, 90-91, 192, 

205, 219 



Simplex method, 
cycling, 21, 111, 181, 212 
iteration, 181 
primal-dual, 20 
mutual, 17-26 
revised, 148 
explicit, 192 

product form, 125, 192, 211-218 
Singleton, 182 
Stochastic constraints, 104 
Stochastic objective function, 103- 

104 

Stochastic programming, 111, 113, 
121, 123, 159, 223-237 



Tableau, 180 

Transformation, unimodular, 40-43, 

271, 289, 293, 323 
Transportation problem, 319, 335- 

336 
Traveling salesman problem, 339- 

341, 343 
Tree, 46 

Triangular matrices, 126-128, 133 
Triangularization, 125-132, 133 
Triangulation, 96-97 

cubical, 97-98 
Two-stage problems, 105-108, 115 



Uncertainty, 103-110, 114, 133 

decision rules, 114 
Unimodular dual linear systems, 7 
Unimodular sets, 39-53 

maximal, 45, 46, 52 

totally, 45 

Unimodular transformation, 40-43, 
271, 289, 293, 323 



Variables, artificial, 135, 181, 219 
basic, 17, 136-140, 180 
nonbasic, 17 
pseudo-basic, 133-148 
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